Dear solr users,

I would appreciate if someone can help me out here. My goal is to index a 
csv-file.

First of all, I am using the CDH 5 beta distribution of Hadoop, which includes 
solr 4.4.0, on a single node. I am following the hue tutorial to index and 
search the data from the yelp dataset challenge 
http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search.

Following the tutorial, I have uploaded the config files, including the 
prepared schema.xml, to zookeeper via the solrctl-command:
>solrctl instancedir --create reviews [path to conf]

After this, I have created the collection via:
>solrctl collection --create reviews -s 1

This works fine, as I can see the collection created in the Solr Admin Web UI 
and the instancedir in the zookeeper shell.

Then, using the MapReduceIndexerTool and the provided morphline file the index 
is created and uploaded to solr. According to the command output the index was 
created successfully:

1481 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing 1 
files using 1 real mappers into 1 reducers
52716 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Done. 
Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
52774 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of output 
shards into Solr cluster...
52829 [pool-4-thread-1] INFO  org.apache.solr.hadoop.GoLive  - Live merge 
hdfs://svr-hdp01:8020/tmp/load/results/part-00000 into 
http://SVR-HDP01:8983/solr
53017 [pool-4-thread-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  
- Creating new http client, 
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
53495 [main] INFO  org.apache.solr.hadoop.GoLive  - Committing live merge...
53496 [main] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  - Creating 
new http client, config:
53512 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - Waiting 
for client to connect to ZooKeeper
53513 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
- Watcher org.apache.solr.common.cloud.ConnectionManager@19014023 
name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
53513 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - Client is 
connected to ZooKeeper
53514 [main] INFO  org.apache.solr.common.cloud.ZkStateReader  - Updating 
cluster state from ZooKeeper...
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Done committing live merge
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of index 
shards into Solr cluster took 0.878 secs
53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging completed 
successfully
53652 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Succeeded 
with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, 
jobId: job_1388405934175_0013
53653 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Success. 
Done. Program took 53.719 secs. Goodbye.

Now, when I go to the web UI and select the created core, I find the core to be 
empty, there are 0 number of Docs and querying it bears no result. My question 
is, if I have to upload the csv-file manually to somewhere on the solr server 
as it seems as if the csv-file was parsed and indexed successfully, but the 
data is missing that was indexed.

I hope, the description of the problem was clear enough. Thanks a lot!
Kind regards

__________________
initions AG
Chi-Hao Huynh
Weidestraße 120a
D-22081 Hamburg

t:   +49 (0) 40 / 41 49 60-62
f:   +49 (0) 40 / 41 49 60-11
e:  hu...@initios.com<mailto:hu...@initios.com>
w: www.initions.com<http://www.initions.com>
Vollständiger Name der Gesellschaft: initions innovative IT solutions AG
Sitz der Gesellschaft: Hamburg
Handelsregister Hamburg B 83929
Aufsichtsratsvorsitzender: Dr. Michael Leue
Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn

Reply via email to