Dear solr users, I would appreciate if someone can help me out here. My goal is to index a csv-file.
First of all, I am using the CDH 5 beta distribution of Hadoop, which includes solr 4.4.0, on a single node. I am following the hue tutorial to index and search the data from the yelp dataset challenge http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search. Following the tutorial, I have uploaded the config files, including the prepared schema.xml, to zookeeper via the solrctl-command: >solrctl instancedir --create reviews [path to conf] After this, I have created the collection via: >solrctl collection --create reviews -s 1 This works fine, as I can see the collection created in the Solr Admin Web UI and the instancedir in the zookeeper shell. Then, using the MapReduceIndexerTool and the provided morphline file the index is created and uploaded to solr. According to the command output the index was created successfully: 1481 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 files using 1 real mappers into 1 reducers 52716 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Done. Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs 52774 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of output shards into Solr cluster... 52829 [pool-4-thread-1] INFO org.apache.solr.hadoop.GoLive - Live merge hdfs://svr-hdp01:8020/tmp/load/results/part-00000 into http://SVR-HDP01:8983/solr 53017 [pool-4-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false 53495 [main] INFO org.apache.solr.hadoop.GoLive - Committing live merge... 53496 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config: 53512 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper 53513 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@19014023 name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 53513 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Client is connected to ZooKeeper 53514 [main] INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper... 53652 [main] INFO org.apache.solr.hadoop.GoLive - Done committing live merge 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of index shards into Solr cluster took 0.878 secs 53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging completed successfully 53652 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Succeeded with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1388405934175_0013 53653 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Success. Done. Program took 53.719 secs. Goodbye. Now, when I go to the web UI and select the created core, I find the core to be empty, there are 0 number of Docs and querying it bears no result. My question is, if I have to upload the csv-file manually to somewhere on the solr server as it seems as if the csv-file was parsed and indexed successfully, but the data is missing that was indexed. I hope, the description of the problem was clear enough. Thanks a lot! Kind regards __________________ initions AG Chi-Hao Huynh Weidestraße 120a D-22081 Hamburg t: +49 (0) 40 / 41 49 60-62 f: +49 (0) 40 / 41 49 60-11 e: hu...@initios.com<mailto:hu...@initios.com> w: www.initions.com<http://www.initions.com> Vollständiger Name der Gesellschaft: initions innovative IT solutions AG Sitz der Gesellschaft: Hamburg Handelsregister Hamburg B 83929 Aufsichtsratsvorsitzender: Dr. Michael Leue Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn