semantic Search in Farsi News for more relevant results returned from search engines

2012-08-25 Thread Alireza Kh


 
Best regard

- Forwarded Message -
From: Alireza Kh master_kh...@yahoo.com
To: u...@uima.apache.org u...@uima.apache.org 
Sent: Tuesday, August 21, 2012 4:14 PM
Subject: 
 

I am a graduate student .my name's Ali Raza Khodabakhshi.  My thesis title 
is(semantic Search in Farsi News for more relevant results returned from search 
engines). I've done the research, I realized that softwares (( 
solr-nutch-siren-uima) can help me on this, but I doubt it in some aspects.

Best regard
-1  Whether these applications fully support Persian language?
2-For semantic search engines have another tool to add to the above list?
Faithfully yours
MSc, Computer Engineer (Software)

Re: semantic Search in Farsi News for more relevant results returned from search engines

2012-08-25 Thread Jack Krupansky
Could you detail the specific requirements for fully support Persian 
language?


What are the qualities, aspects, and characteristics that need support, both 
for indexing of content and processing of queries?


-- Jack Krupansky

-Original Message- 
From: Alireza Kh

Sent: Saturday, August 25, 2012 6:20 AM
To: solr-user@lucene.apache.org
Subject: semantic Search in Farsi News for more relevant results returned 
from search engines





Best regard

- Forwarded Message -
From: Alireza Kh master_kh...@yahoo.com
To: u...@uima.apache.org u...@uima.apache.org
Sent: Tuesday, August 21, 2012 4:14 PM
Subject:


I am a graduate student .my name's Ali Raza Khodabakhshi.  My thesis title 
is(semantic Search in Farsi News for more relevant results returned from 
search engines). I've done the research, I realized that softwares (( 
solr-nutch-siren-uima) can help me on this, but I doubt it in some aspects.


Best regard
-1  Whether these applications fully support Persian language?
2-For semantic search engines have another tool to add to the above list?
Faithfully yours
MSc, Computer Engineer (Software) 




RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-25 Thread Fuad Efendi

This is bug in Solr 4.0.0-Beta Schema Browser: Load Term Info shows 9682
News, but direct query shows 3577.

/solr/core0/select?q=channel:Newsfacet=truefacet.field=channelrows=0

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
str name=facettrue/str
str name=qchannel:News/str
str name=facet.fieldchannel/str
str name=rows0/str
/lst
/lst
result name=response numFound=3577 start=0/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=channel
int name=News3577/int
int name=Blogs0/int
int name=Message Boards0/int
int name=Video0/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response 


-Original Message-
Sent: August-24-12 11:29 PM
To: solr-user@lucene.apache.org
Cc: sole-...@lucene.apache.org
Subject: RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser
Importance: High

Any news? 
CC: Dev


-Original Message-
Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

Hi there,

Load term Info shows 3650 for a specific term MyTerm, and when I execute
query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.

-Fuad

--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca






Re: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-25 Thread Ryan McKinley
If you optimize the index, are the results the same?

maybe it is showing counts for deleted docs (i think it does... and
this is expected)

ryan


On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi f...@efendi.ca wrote:

 This is bug in Solr 4.0.0-Beta Schema Browser: Load Term Info shows 9682
 News, but direct query shows 3577.

 /solr/core0/select?q=channel:Newsfacet=truefacet.field=channelrows=0

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
 str name=facettrue/str
 str name=qchannel:News/str
 str name=facet.fieldchannel/str
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=3577 start=0/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=channel
 int name=News3577/int
 int name=Blogs0/int
 int name=Message Boards0/int
 int name=Video0/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response


 -Original Message-
 Sent: August-24-12 11:29 PM
 To: solr-user@lucene.apache.org
 Cc: sole-...@lucene.apache.org
 Subject: RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser
 Importance: High

 Any news?
 CC: Dev


 -Original Message-
 Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

 Hi there,

 Load term Info shows 3650 for a specific term MyTerm, and when I execute
 query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
 happens after I commit data too, nothing changes; and this field is
 single-valued non-tokenized string.

 -Fuad

 --
 Fuad Efendi
 416-993-2060
 http://www.tokenizer.ca





 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-25 Thread Ramzi Alqrainy
It will never return no result because its relative to score in previous
result

If score0.25*last_score then stop

Since score0 and last score is 0 for initial hit it will not stop



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4003247.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-25 Thread Ramzi Alqrainy
You are right Mr.Ravish, because this depends on (ranking and search fields)
formula, but please allow me to tell you that Solr score can help us to
define this document is relevant or not in some cases. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4003248.html
Sent from the Solr - User mailing list archive at Nabble.com.


RecursivePrefixTreeStrategy class not found

2012-08-25 Thread Jones, Dan
According to the document I was reading here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

First, you must register a spatial field type in the Solr schema.xml file. The 
instructions in this whole document imply the 
RecursivePrefixTreeStrategyhttp://wiki.apache.org/solr/RecursivePrefixTreeStrategy
 based field type used in a geospatial context.

fieldType name=geo   
class=org.apache.solr.spatial.RecursivePrefixTreeFieldType

   
spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory

   distErrPct=0.025

   maxDetailDist=0.001

/


I need to set the fieldType to 
RecursivePrefixTreeStrategyhttp://wiki.apache.org/solr/RecursivePrefixTreeStrategy
 and of course, I'm getting class not found. I'm using the latest solr 
4.0.0-BETA

I have a field that I would like to import into solr that is a MULTIPOLYGON

For Example:
TUVTuvalu  MULTIPOLYGON (((179.21322733454343 -8.561290924154292, 
179.20240933453334 -8.465417924064994, 179.2183813345482 -8.481890924080346, 
179.2251453345545 -8.492217924089957, 179.23109133456006 -8.50491792410179, 
179.23228133456115 -8.51841792411436, 179.23149133456042 -8.533499924128407, 
179.22831833455746 -8.543426924137648, 179.22236333455191 -8.554145924147633, 
179.21322733454343 -8.561290924154292)), ((177.2902543327525 
-6.114445921875486, 177.28137233274424 -6.109863921871224, 177.27804533274116 
-6.099445921861516, 177.28137233274424 -6.089445921852203, 177.3055273327667 
-6.10597292186759, 177.2958093327577 -6.113890921874969, 177.2902543327525 
-6.114445921875486)), ((176.30636333183617 -6.288335922037433, 
176.29871833182904 -6.285135922034456, 176.29525433182584 -6.274581922024623, 
176.30601833183584 -6.260135922011173, 176.31198133184142 -6.28215492203168, 
176.30636333183617 -6.288335922037433)), ((178.69580033406152 
-7.484163923151129, 178.68885433405507 -7.480835923148035, 178.68878133405497 
-7.467572923135677, 178.7017813340671 -7.475208923142787, 178.69580033406152 
-7.484163923151129)))


Since the LSP was moved into Solr, would there be a different name for the 
class?
(I'm not sure the factory class above can be found yet either)

Any help would be much appreciated!





This communication (including all attachments) is intended solely for
the use of the person(s) to whom it is addressed and should be treated
as a confidential AAA communication. If you are not the intended
recipient, any use, distribution, printing, or copying of this email is
strictly prohibited. If you received this email in error, please
immediately delete it from your system and notify the originator. Your
cooperation is appreciated.


RE: RecursivePrefixTreeStrategy class not found

2012-08-25 Thread Jones, Dan
SORRY!

RecursivePrefixTreeFieldType cannot be found!




Sent: Saturday, August 25, 2012 6:30 PM
To: solr-user@lucene.apache.org
Subject: RecursivePrefixTreeStrategy class not found

According to the document I was reading here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

First, you must register a spatial field type in the Solr schema.xml file. The 
instructions in this whole document imply the 
RecursivePrefixTreeStrategyhttp://wiki.apache.org/solr/RecursivePrefixTreeStrategy
 based field type used in a geospatial context.

fieldType name=geo   
class=org.apache.solr.spatial.RecursivePrefixTreeFieldType

   
spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory

   distErrPct=0.025

   maxDetailDist=0.001

/


I need to set the fieldType to 
RecursivePrefixTreeStrategyhttp://wiki.apache.org/solr/RecursivePrefixTreeStrategy
 and of course, I'm getting class not found. I'm using the latest solr 
4.0.0-BETA

I have a field that I would like to import into solr that is a MULTIPOLYGON

For Example:
TUVTuvalu  MULTIPOLYGON (((179.21322733454343 -8.561290924154292, 
179.20240933453334 -8.465417924064994, 179.2183813345482 -8.481890924080346, 
179.2251453345545 -8.492217924089957, 179.23109133456006 -8.50491792410179, 
179.23228133456115 -8.51841792411436, 179.23149133456042 -8.533499924128407, 
179.22831833455746 -8.543426924137648, 179.22236333455191 -8.554145924147633, 
179.21322733454343 -8.561290924154292)), ((177.2902543327525 
-6.114445921875486, 177.28137233274424 -6.109863921871224, 177.27804533274116 
-6.099445921861516, 177.28137233274424 -6.089445921852203, 177.3055273327667 
-6.10597292186759, 177.2958093327577 -6.113890921874969, 177.2902543327525 
-6.114445921875486)), ((176.30636333183617 -6.288335922037433, 
176.29871833182904 -6.285135922034456, 176.29525433182584 -6.274581922024623, 
176.30601833183584 -6.260135922011173, 176.31198133184142 -6.28215492203168, 
176.30636333183617 -6.288335922037433)), ((178.69580033406152 
-7.484163923151129, 178.68885433405507 -7.480835923148035, 178.68878133405497 
-7.467572923135677, 178.7017813340671 -7.475208923142787, 178.69580033406152 
-7.484163923151129)))


Since the LSP was moved into Solr, would there be a different name for the 
class?
(I'm not sure the factory class above can be found yet either)

Any help would be much appreciated!





This communication (including all attachments) is intended solely for the use 
of the person(s) to whom it is addressed and should be treated as a 
confidential AAA communication. If you are not the intended recipient, any use, 
distribution, printing, or copying of this email is strictly prohibited. If you 
received this email in error, please immediately delete it from your system and 
notify the originator. Your cooperation is appreciated.

This communication (including all attachments) is intended solely for
the use of the person(s) to whom it is addressed and should be treated
as a confidential AAA communication.  If you are not the intended
recipient, any use, distribution, printing, or copying of this email is
strictly prohibited.  If you received this email in error, please
immediately delete it from your system and notify the originator.  Your
cooperation is appreciated.



Re: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-25 Thread Lance Norskog
The index directory will include files which list deleted documents.
(I do not remember the suffix.)

If you do not like this behavior, you can add 'expunge deletes' to
your commit requests.

On Sat, Aug 25, 2012 at 10:27 AM, Ryan McKinley ryan...@gmail.com wrote:
 If you optimize the index, are the results the same?

 maybe it is showing counts for deleted docs (i think it does... and
 this is expected)

 ryan


 On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi f...@efendi.ca wrote:

 This is bug in Solr 4.0.0-Beta Schema Browser: Load Term Info shows 9682
 News, but direct query shows 3577.

 /solr/core0/select?q=channel:Newsfacet=truefacet.field=channelrows=0

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
 str name=facettrue/str
 str name=qchannel:News/str
 str name=facet.fieldchannel/str
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=3577 start=0/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=channel
 int name=News3577/int
 int name=Blogs0/int
 int name=Message Boards0/int
 int name=Video0/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response


 -Original Message-
 Sent: August-24-12 11:29 PM
 To: solr-user@lucene.apache.org
 Cc: sole-...@lucene.apache.org
 Subject: RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser
 Importance: High

 Any news?
 CC: Dev


 -Original Message-
 Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

 Hi there,

 Load term Info shows 3650 for a specific term MyTerm, and when I execute
 query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
 happens after I commit data too, nothing changes; and this field is
 single-valued non-tokenized string.

 -Fuad

 --
 Fuad Efendi
 416-993-2060
 http://www.tokenizer.ca





 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Lance Norskog
goks...@gmail.com


Re: How do I represent a group of customer key/value pairs

2012-08-25 Thread Lance Norskog
There are more advanced ways to embed hierarchy in records. This describes them:

http://wiki.apache.org/solr/HierarchicalFaceting

(This is a great page, never noticed it.)

On Fri, Aug 24, 2012 at 8:12 PM, Sheldon P sporc...@gmail.com wrote:
 Thanks for the prompt reply Jack.  Could you point me towards any code
 examples of that technique?


 On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
 The general rule in Solr is simple: denormalize your data.

 If you have some maps (or tables) and a set of keys (columns) for each map
 (table), define fields with names like map-name_key-name, such as
 map1_name, map2_name, map1_field1, map2_field1. Solr has dynamic
 fields, so you can define map-name_* to have a desired type - if all the
 keys have the same type.

 -- Jack Krupansky

 -Original Message- From: Sheldon P
 Sent: Friday, August 24, 2012 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: How do I represent a group of customer key/value pairs


 I've just started to learn Solr and I have a question about modeling data
 in the schema.xml.

 I'm using SolrJ to interact with my Solr server.  It's easy for me to store
 key/value paris where the key is known.  For example, if I have:

 title=Some book title
 author=The authors name


 I can represent that data in the schema.xml file like this:

field name=title type=text_general indexed=true
 stored=true/
field name=author type=text_general indexed=true
 stored=true/

 I also have data that is stored as a Java HashMap, where the keys are
 unknown:

 MapString, String map = new HashMapString, String();
 map.put(some unknown key, some unknown data);
 map.put(another unknown key, more unknown data);


 I would prefer to store that data in Solr without losing its hierarchy.
 For example:

 field name=map type=maptype indexed=true stored=true/

 field name=some unknown key type=text_general indexed=true
 stored=true/

 field name=another unknown key type=text_general indexed=true
 stored=true/

 /field


 Then I could search for some unknown key, and receive some unknown data.

 Is this possible in Solr?  What is the best way to store this kind of data?



-- 
Lance Norskog
goks...@gmail.com


Re: More debugging DIH - URLDataSource

2012-08-25 Thread Lance Norskog
About XPaths: the XPath engine does a limited range of xpaths. The doc
says that your paths are covered.

About logs: You only have the RegexTransformer listed. You need to add
LogTransformer to the transformer list:
http://wiki.apache.org/solr/DataImportHandler#LogTransformer

Having xml entity codes in the url string seems right. Can you verify
the url that goes to the remote site? Can you read the logs at the
remote site? Can you run this code through a proxy and watch the data?

On Fri, Aug 24, 2012 at 1:34 PM, Carrie Coy c...@ssww.com wrote:
 I'm trying to write a DIH to incorporate page view metrics from an XML feed
 into our index.   The DIH makes a single request, and updates 0 documents.
 I set log level to finest for the entire dataimport section, but I still
 can't tell what's wrong.  I suspect the XPath.
 http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport
 returns 404.  Any suggestions on how I can debug this?

*

  solr-spec
  4.0.0.2012.08.06.22.50.47


 The XML data:

 ?xml version='1.0' encoding='UTF-8'?
 ReportDataResponse
 Data
 Rows
 Row rowKey=P#PRODUCT: BURLAP POTATO SACKS  (PACK OF 12)
 (W4537)#N/A#5516196614 rowActionAvailability=0 0 0
 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP POTATO
 SACKS  (PACK OF 12) (W4537)/Value
 Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value
 /Row
 Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM  (BAG OF 850)
 (BE9000)#N/A#5521976460 rowActionAvailability=0 0 0
 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY
 BEADS 6X9MM  (BAG OF 850) (BE9000)/Value
 Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value
 /Row
 /Rows
 /Data
 /ReportDataResponse

 My DIH:

 |dataConfig
  dataSource name=coremetrics
  type=URLDataSource
  encoding=UTF-8
  connectionTimeout=5000
  readTimeout=1/

  document
 entity  name=coremetrics
 dataSource=coremetrics
 pk=id

 url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930;
 processor=XPathEntityProcessor
 stream=true
 forEach=/ReportDataResponse/Data/Rows/Row
 logLevel=fine
 transformer=RegexTransformer  

 field  column=part_code  name=id
 xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']
 regex=/^PRODUCT:.*\((.*?)\)$/  replaceWith=$1/
 field  column=page_views
 xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']  /
/entity
  /document
 /dataConfig
 |

 |||This little test perl script correctly extracts the data:|
 ||
 |use XML::XPath;|
 |use XML::XPath::XMLParser;|
 ||
 |my $xp = XML::XPath-new(filename = 'cm.xml');|
 |||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');|
 |||foreach my $node ($nodeset-get_nodelist) {|
 |||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');|
 |my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');|
 |$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
 |}|

 From logs:

 INFO: Loading DIH Configuration: data-config.xml
 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
 loadDataConfig
 INFO: Data Configuration loaded successfully
 Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
 status=0 QTime=2
 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 INFO: Starting Full Import
 Aug 24, 2012 3:53:10 PM
 org.apache.solr.handler.dataimport.SimplePropertiesWriter
 readIndexerProperties
 INFO: Read dataimport.properties
 Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2
 deleteAll
 INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource
 getData
 FINE: Accessing URL:
 https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930
 Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
 QTime=0
 Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
 QTime=1
 Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
 QTime=1
 Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
 QTime=0
 Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
 INFO: [ssww] webapp=/solr path=/dataimport 

Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-25 Thread Lance Norskog
A few other things:
Support: many of the Solr committers do not like the Embedded server.
It does not get much attention, so if you find problems with it you
may have to fix them and get someone to review and commit the fixes.
I'm not saying they sabotage it, there just is not much interest in
making it first-class.

Replication: you can replicate from the Embedded server with the old
rsync-based replicator. The Java Replication tool requires servlets.
If you are Unix-savvy, the rsync tool is fine.

Indexing speed:
1) You can use shards to split the index into pieces. This divides the
indexing work among the shards.
2) Do not store the giant data. A lot of sites instead archive the
datafile and index a link to the file. Giant stored fields cause
indexing speed to drop dramatically because stored data is not saved
just once: it is copied repeatedly during merging as new documents are
added. Index data is also copied around, but this tends to increase
sub-linearly since documents share terms.
3) Do not store positions and offsets. These allow you to do phrase
queries because they store the position of each word. They take a lot
of memory, and have to be copied around during merging.

On Thu, Aug 23, 2012 at 1:31 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 I know the following drawbacks of EmbServer:

- org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams()
which is called on handling update request, provides a lot of garbage in
memory and bloat it by expensive XML.
- 
 org.apache.solr.response.BinaryResponseWriter.getParsedResponse(SolrQueryRequest,
SolrQueryResponse) does something like this on response side - it just
bloat your heap

 for me your task is covered by Multiple Cores. Anyway if you are ok with
 EmbeddedServer let it be. Just be aware of stream updates feature
 http://wiki.apache.org/solr/ContentStream

 my average indexing speed estimate is for fairly small docs less than 1K
 (which are always used for micro-benchmarking).

 Much analysis is the key argument for invoking updates in multiple threads.
 What's your CPU stat during indexing?




 On Thu, Aug 23, 2012 at 7:52 AM, ksu wildcats ksu.wildc...@gmail.comwrote:

 Thanks for the reply Mikhail.

 For our needs the speed is more important than flexibility and we have huge
 text files (ex: blogs / articles ~2 MB size) that needs to be read from our
 filesystem and then store into the index.

 We have our app creating separate core per client (dynamically) and there
 is
 one instance of EmbeddedSolrServer for each core thats used for adding
 documents to the index.
 Each document has about 10 fields and one of the field has ~2MB data stored
 (stored = true, analyzed=true).
 Also we have logic built into our webapp to dynamically create the solr
 config files
 (solrConfig  schema per core - filters/analyzers/handler values can be
 different for each core)
 for each core before creating an instance of EmbeddedSolrServer for that
 core.
 Another reason to go with EmbeddedSolrServer is to reduce overhead of
 transporting large data (~2 MB) over http/xml.

 We use this setup for building our master index which then gets replicated
 to slave servers
 using replication scripts provided by solr.
 We also have solr admin ui integrated into our webapp (using admin jsp 
 handlers from solradmin ui)

 We have been using this MultiCore setup for more than a year now and so far
 we havent run into any issues with EmbeddedSolrServer integrated into our
 webapp.
 However I am now trying to figure out the impact if we allow multiple
 threads sending request to EmbeddedSolrServer (same core) for adding docs
 to
 index simultaneously.

 Our understanding was that EmbeddedSolrServer would give us better
 performance over http solr for our needs.
 Its quite possible that we might be wrong and http solr would have given us
 similar/better performance.

 Also based on documentation from SolrWiki I am assuming that
 EmbeddedSolrServer API is same as the one used by Http Solr.

 Said that, can you please tell if there is any specific downside to using
 EmbeddedSolrServer that could cause issues for us down the line.

 I am also interested in your below comment about indexing 1 million docs in
 few mins. Ideally we would like to get to that speed
 I am assuming this depends on the size of the doc and type of
 analyzer/tokenizer/filters being used. Correct?
 Can you please share (or point me to documentation) on how to get this
 speed
 for 1 mil docs.
   - one million is a fairly small amount, in average it should be indexed
  in few mins. I doubt that you really need to distribute indexing

 Thanks
 -K



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Index-Concurrency-Is-it-possible-to-have-multiple-threads-write-to-same-index-tp4002544p4002776.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 

Re: How do I represent a group of customer key/value pairs

2012-08-25 Thread Sheldon P
Thanks Lance.  It looks like it's worth investigating.  I've already
started down the path of using a bean with @Field(map_*) on my
HashMap setter.  This defect tipped me off on this functionality:
https://issues.apache.org/jira/browse/SOLR-1357
This technique provides me with a mechanism to store the HashMap data,
but flattens the structure.  I'll play with the ideas provided on
http://wiki.apache.org/solr/HierarchicalFaceting;.  If anyone has
some sample code (java + schema.xml) they can point me too that does
Hierarchical Faceting I would very much appreciate it.


On Sat, Aug 25, 2012 at 6:42 PM, Lance Norskog goks...@gmail.com wrote:
 There are more advanced ways to embed hierarchy in records. This describes 
 them:

 http://wiki.apache.org/solr/HierarchicalFaceting

 (This is a great page, never noticed it.)

 On Fri, Aug 24, 2012 at 8:12 PM, Sheldon P sporc...@gmail.com wrote:
 Thanks for the prompt reply Jack.  Could you point me towards any code
 examples of that technique?


 On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
 The general rule in Solr is simple: denormalize your data.

 If you have some maps (or tables) and a set of keys (columns) for each map
 (table), define fields with names like map-name_key-name, such as
 map1_name, map2_name, map1_field1, map2_field1. Solr has dynamic
 fields, so you can define map-name_* to have a desired type - if all the
 keys have the same type.

 -- Jack Krupansky

 -Original Message- From: Sheldon P
 Sent: Friday, August 24, 2012 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: How do I represent a group of customer key/value pairs


 I've just started to learn Solr and I have a question about modeling data
 in the schema.xml.

 I'm using SolrJ to interact with my Solr server.  It's easy for me to store
 key/value paris where the key is known.  For example, if I have:

 title=Some book title
 author=The authors name


 I can represent that data in the schema.xml file like this:

field name=title type=text_general indexed=true
 stored=true/
field name=author type=text_general indexed=true
 stored=true/

 I also have data that is stored as a Java HashMap, where the keys are
 unknown:

 MapString, String map = new HashMapString, String();
 map.put(some unknown key, some unknown data);
 map.put(another unknown key, more unknown data);


 I would prefer to store that data in Solr without losing its hierarchy.
 For example:

 field name=map type=maptype indexed=true stored=true/

 field name=some unknown key type=text_general indexed=true
 stored=true/

 field name=another unknown key type=text_general indexed=true
 stored=true/

 /field


 Then I could search for some unknown key, and receive some unknown data.

 Is this possible in Solr?  What is the best way to store this kind of data?



 --
 Lance Norskog
 goks...@gmail.com