Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Jimmy Lin
unsubscribe

On Sat, Aug 22, 2015 at 9:31 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.

 However, I find that clustering is exceeding slow after I index this 1GB of
 data. It took almost 30 seconds to return the cluster results when I set it
 to cluster the top 1000 records, and still take more than 3 seconds when I
 set it to cluster the top 100 records.

 Is this speed normal? Cos i understand Solr can index terabytes of data
 without having the performance impacted so much, but now the collection is
 slowing down even with just 1GB of data.

 Below is my clustering configurations in solrconfig.xml.

  requestHandler name=/clustering
   startup=lazy
   enable=${solr.clustering.enabled:true}
   class=solr.SearchHandler
 lst name=defaults
str name=echoParamsexplicit/str
   int name=rows1000/int
str name=wtjson/str
str name=indenttrue/str
   str name=dftext/str
   str name=flnull/str

   bool name=clusteringtrue/bool
   bool name=clustering.resultstrue/bool
   str name=carrot.titlesubject content tag/str
   bool name=carrot.produceSummarytrue/bool

  int name=carrot.fragSize20/int
   !-- the maximum number of labels per cluster --
   int name=carrot.numDescriptions20/int
   !-- produce sub clusters --
   bool name=carrot.outputSubClustersfalse/bool
  str name=LingoClusteringAlgorithm.desiredClusterCountBase7/str

   !-- Configure the remaining request handler parameters. --
   str name=defTypeedismax/str
 /lst
 arr name=last-components
   strclustering/str
 /arr
   /requestHandler


 Regards,
 Edwin



Optimal setup for multiple tools

2014-04-26 Thread Jimmy Lin
Hello,

My team has been working with SOLR for the last 2 years.  We have two main
indices:

1. documents
-index and store main text
-one record for each document
2. places (all of the geospatial places found in the documents above)
-index but don't store main text
-one record for each place.  could have thousands in a single
document but the ratio has seemed to come out to 6:1 places to documents

We have several tools that query the above indices.  One is just a standard
search tool that returns documents filtered on keyword, temporal, and
geospatial filters.  Another is a geospatial tool that queries the places
collection.  We now have a requirement to provide document highlighting
when querying in the geospatial tool.

Does anyone have any suggestions/prior experience on how they would set up
two collections that are essentially different views of the data?  Also
any tips on how to ensure that these two collections are in sync (meaning
any documents indexed into the documents collection are also properly
indexed in places)?

Thanks alot,

Jimmy Lin


SOLR and Kerberos enabled HDFS

2014-03-03 Thread Jimmy
Hello,

I am trying to connect SOLR (tried 4.4 and 4.7) to kerberos enabled HDFS -
I am using Cloudera CDH 4.2.1
http://maven-repository.com/artifact/com.cloudera.cdh/cdh-root/4.2.1/pom_effective

the keytab and principal is valid (I tested it with flume as well as simple
hdfs cli)


did anobody successfully connect SOLR 4.x to CDH 4.2.1?



str
name=solr.hdfs.security.kerberos.enabled${solr.hdfs.security.kerberos.enabled:true}/str
str
name=solr.hdfs.security.kerberos.keytabfile${solr.hdfs.security.kerberos.keytabfile:/my.keytab}/str
str name=solr.hdfs.security.kerberos.principal${
solr.hdfs.security.kerberos.principal:m...@mydomain.com}/str


I am getting follow error


HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init
failure: java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab,
trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not
available due to init failure:
java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.RuntimeException: java.io.IOException: Login failure
for me@MYDOMAIN.COMfrom keytab /my.keytab
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:282)
at
org.apache.solr.core.HdfsDirectoryFactory.init(HdfsDirectoryFactory.java:90)
at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:443)
at org.apache.solr.core.SolrCore.init(SolrCore.java:672)
at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ...

... 3 more Caused by: java.io.IOException: Login failure for
m...@mydomain.com from
keytab /my.keytab
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:825)
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:280)

... 16 more Caused by: javax.security.auth.login.LoginException:
java.lang.IllegalArgumentException: Illegal principal name m...@mydomain.com
at org.apache.hadoop.security.User.init(User.java:50)
at org.apache.hadoop.security.User.init(User.java:43)
at
org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
at java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
at 

Re: Adding attributes to Solr fields ?

2013-10-04 Thread jimmy nguyen
Hello,

and thank you for your answer Shawn.

I tried to simplify my problem but I realize I chose a bad example : I
don't process phone numbers, and I do process unstructured documents.

My GATE application might return several annotations for the same group of
words (because I'm using an ontology). So for example, I will have an
Animal annotation, which marks the words cat, catfish and eider as
Animal(s), and (depending on the ontology used) the cat annotation will
have 2 features : Animal.class=mammal and Animal.class=cat, the catfish
will have 1 feature Animal.class=fish, and the more specific term eider
will have 2 features : Animal.class=bird, Animal.class=duck.

I don't want to consider 1 solr document for each animal, I really want 1
index for each actual document. I'd like to be able to query my solr index
for bird and get all the documents containing the terms bird, or any
subclass or instance (like duck or eider). Since all the words bird,
duck and eider appearing in my documents will be tagged as Animal and
there will be an annotation with Animal.class=bird, it is easy to get Solr
to return the right documents.

But since I get something like :

result
  doc
str name=idhdfs://.../str
arr name=animal
  strcat/str
  strcat/str
  strcatfish/str
  streider/str
  streider/str
/arr
arr name=class
  strmammal/str
  strcat/str
  strfish/str
  strbird/str
  strduck/str
/arr
arr name=instance
  strhttp://.../Animal#catfish/str
  strhttp://.../Animal#eider/str
  strhttp://.../Animal#eider/str
/arr
  /doc
  doc
   ...
  /doc
  doc
   ...
  /doc
/result

... when I want to generate a snippet of the document and highlight the
terms whose appearance made solr return the document (like the first
document containing eider when the user is searching for a bird), I'd
like to highlight the term eider in the snippet, but I don't know how to
do that. Having a correspondance between my solr animal and class
fields (for example, an id attribute that would link them : str
id=5eider/str and the same id for the class bird) would make it
easier to highlight my term eider.

What do you think ?

Thanks !
Jim


Adding attributes to Solr fields ?

2013-10-03 Thread jimmy nguyen
Hi all,

is it possible to add attributes to our Solr fields ?

 I'm indexing GATE-annotated documents into solr. The annotations produced
by my GATE application usually have several features (for example,
Person.title, Person.name, Person.phoneNumber...).
Now each of my documents may contain more than one Person annotation, and
each person might have more than one phone number... Unfortunately I don't
know how to index all the features for one annotation in one field in solr.

So instead, I would like to add an attribute id (or offset) to each of
the features I'm sending to Solr in order to be able to find out, for
example, which Person.name goes with which Person.phoneNumber.

So instead of:
doc str name=id1/str
arr name=person strJane Doe/str strJohn Doe/str
arr name=person_phoneNumber str0123456789/str str1234567890/str
str2345678901/str
/doc

I'd like to get something like this in Solr:
doc
str name=id1/str
arr name=person str id=1Jane Doe/str str id=2John Doe/str
arr name=person_phoneNumber str id=10123456789/str str id=1
1234567890/str
str id=22345678901/str
/doc

This way it is easy to link the 2 first phone numbers to Jane Doe and the
last one to John Doe.

Any idea ?

Thanks !
Jim


Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :

types
...
fieldType name=person class=solr.StrField subSuffix=_person /
...
/types
...
fields
...
field name=personinfo type=person indexed=true stored=true
multiValued=true /
dynamicField name=*_person type=text_general indexed=true
stored=false /
...
/fields


But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

Re: Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

thanks for the answer. Sorry, I actually meant attribute subFieldSuffix.

So, in order to be able to index several features in one solr field, should
I program a new Java class inheriting AbstractSubTypeFieldType ? Or is
there another way to do it ?

Thanks !
Jim


On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky j...@basetechnology.comwrote:

 There is no such fieldType attribute as subSuffix. Solr is just
 complaining about extraneous, junk attributes. Delete the crap.

 -- Jack Krupansky

 -Original Message- From: jimmy nguyen
 Sent: Thursday, September 19, 2013 12:43 PM
 To: solr-user@lucene.apache.org
 Subject: Indexing several sub-fields in one solr field


 Hello,

 I'd like to index into Solr (4.4.0) documents that I previously annotated
 with GATE (7.1).
 I use Behemoth to be able to run my GATE application on a corpus of
 documents on Hadoop, and then Behemoth allows me to directly send my
 annotated documents to solr. But my question is not about the Behemoth or
 Hadoop parts.

 The annotations produced by my GATE application usually have several
 features (for example, annotation type Person has the following features :
 Person.title, Person.firstName, Person.lastName, Person.gender).
 Each of my documents may contain more than one Person annotation, which is
 why I would like to index all the features for one annotation in one field
 in solr.
 How do I do that ?

 I thought I'd add the following lines in schema.xml :

 types
 ...
 fieldType name=person class=solr.StrField subSuffix=_person /
 ...
 /types
 ...
 fields
 ...
 field name=personinfo type=person indexed=true stored=true
 multiValued=true /
 dynamicField name=*_person type=text_general indexed=true
 stored=false /
 ...
 /fields


 But as soon as I start my solr instances and try to access solr from my
 browser, I get an HTTP ERROR 500 :

 Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
 Plugin Initializing failure for [schema.xml]
 fieldType,trace=org.apache.**solr.common.SolrException: SolrCore
 'collection1' is not available due to init failure: Plugin Initializing
 failure for [schema.xml] fieldType
 at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:287)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:158)
 at
 org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1419)
 at
 org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
 ServletHandler.java:455)
 at
 org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
 ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.**session.SessionHandler.**
 doHandle(SessionHandler.java:**231)
 at
 org.eclipse.jetty.server.**handler.ContextHandler.**
 doHandle(ContextHandler.java:**1075)
 at org.eclipse.jetty.servlet.**ServletHandler.doScope(**
 ServletHandler.java:384)
 at
 org.eclipse.jetty.server.**session.SessionHandler.**
 doScope(SessionHandler.java:**193)
 at
 org.eclipse.jetty.server.**handler.ContextHandler.**
 doScope(ContextHandler.java:**1009)
 at
 org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
 ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(**
 ContextHandlerCollection.java:**255)
 at
 org.eclipse.jetty.server.**handler.HandlerCollection.**
 handle(HandlerCollection.java:**154)
 at
 org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
 HandlerWrapper.java:116)
 at org.eclipse.jetty.server.**Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(**
 AbstractHttpConnection.java:**489)
 at
 org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(**
 BlockingHttpConnection.java:**53)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(**
 AbstractHttpConnection.java:**942)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection$**
 RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004)
 at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640)
 at org.eclipse.jetty.http.**HttpParser.parseAvailable(**
 HttpParser.java:235)
 at
 org.eclipse.jetty.server.**BlockingHttpConnection.handle(**
 BlockingHttpConnection.java:**72)
 at
 org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(**
 SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
 QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(**
 QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.**java:722)
 Caused by: org.apache.solr.common.**SolrException: Plugin Initializing
 failure for [schema.xml] fieldType
 at
 org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
 AbstractPluginLoader.java:193

Solr Fuzzy search on short string

2013-03-26 Thread Jimmy Dean


I did a fuzzy search on solr. The result is a little strange to me.

Query carj~ can match carl. But cari can't match carl.

As a matter of fact, car[x]~, [x]i can match carl.

Is this the correct behavior?
Jimmey

Re: complex keywords, hierarchical data, Solr representation problem

2012-01-09 Thread jimmy
Thanks for the fast reply. I went with your suggestion and saved the full
category path as well the category_id as integer. I also tested the index
space consumption and it was less than I thought.

So, if i only store the category_id as an integer I have a full index size
of 246MB.
With the full category path (just indexed and not stored) it's 275MB. Only
12% more.
If I also store the field, it goes up to 342MB, but I don't think this is
necessary.


Ted Dunning wrote
 
 Option 3 is preferably because you can use phrase queries to get
 interesting results as in color light beige or color light.
 
 Normalizing is bad in this kind of environment.
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/complex-keywords-hierarchical-data-Solr-representation-problem-tp3642588p3644563.html
Sent from the Solr - User mailing list archive at Nabble.com.


complex keywords, hierarchical data, Solr representation problem

2012-01-08 Thread jimmy
Hi,

I'm new to Solr and already highly impressed about its possibilities and
speed. Until now, I only have used a relational database (MySQL) and
programmed so far everything in php or Java.

Now, I'm stuck and don't know how to represent my data in a Solr Index.

To simplify things, first I want to give a partial representation of my
data.

First Table KEYWORDS:
keyword_id, keyword
1, white horse
2, black horse
3, dark dog
4, brown cat

Second Table CATEGORY:
category_id, parent_id, category, full_category
1, 0, color, color
2, 1, light, color  light
3, 2, white, color  light  white
4, 2, beige, color  light  beige
5, 1, dark , color  dark
6, 5, dark , color  dark  black
7, 5, dark , color  dark  brown
8, 0, animal, animal
9, 8, horse,  animal  horse
10,8, dog,animal  dog
11,8, cat,animal  cat

Third Table ANIMAL:
animal_id, animal_name, keyword_ids, category_ids
1, Cathago, 1, 3:7
2, Zebra, 1:2, 3:6:7
3, Bello, 3, 5:10
4, Kitty, 7, 11

There are numerous possibilities to represent the data in Solr.

Solution 1:
Save all data like they are represented in my relational database. If
someone searches for brown cat, I first look for the keyword, which results
in 3, then search with the value 3 in the animal table and finally fetch the
category 11. I know, that with Solr 4 and the new JOIN function I could do
this in one query some time in the future. The only benefit I see in this
solution is the same as in MySQL: To save space and not having same field
values saved twice.

Solution 2:
I just save all data based on the animal table, but denormalized.

animal_name: Kitty
keyword: brown cat
category: animal
category: cat
category: color
category: dark
category: brown

animal_name:Zebra
keyword: white horse
keyword: black horse
category: animal
category: horse
category: color
category: light
category: white
category: dark
category: black

Solution 3:
Similar to Solution 2, but with deepest category only

animal_name: Kitty
keyword: brown cat
category: animal  cat
category: color  dark  brown

animal_name:Zebra
keyword: white horse
keyword: black horse
category: animal  horse 
category: color  light  white
category: color  dark  black

You already see, that I'm very confused :-) The second solution gets kind of
messy and the third solution have me to extract the parent categories by
hand.

Is there anything you can recommend how to solve this the smart way?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/complex-keywords-hierarchical-data-Solr-representation-problem-tp3642588p3642588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: add CJKTokenizer to solr

2007-01-30 Thread zha jimmy

Thank you all, it's works now:).

2007/1/30, James liu [EMAIL PROTECTED]:


he now is ok.


--
regards
jl




add CJKTokenizer to solr

2007-01-28 Thread zha jimmy

hi, all

I am try to config solr to support chinese tokenize。

I saw the tips in schema.xml:

   !-- One can also specify an existing Analyzer class that has a
default constructor via the class attribute on the analyzer element
   fieldtype name=text_greek class=solr.TextField 
 analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/
   /fieldType
   --

   Then I modified schema.xml

  fieldtype name=text class=solr.TextField positionIncrementGap=100
 analyzer
   tokenizer class=org.apache.lucene.analysis.cjk.CJKTokenizer /
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldtype:

   When I start the solr there is some error Caused by:
java.lang.ClassNotFoundException:
org.apache.lucene.analysis.cjk.CJKTokenizer.

   I realized that solr do not have the CJK package ,but how can I add it
in?