Efficient query to obtain DF

2017-10-15 Thread Reth RM
Dear Solr-User Group,

   Can you please suggest efficient query for retrieving term to document
frequency(df) of that term at shard index level?

I know we can get term to df mapping by applying termVectors component
,
however, results returned by this component are each doc to term and its
df. I was looking for straight forward flat list of terms-df mapping,
similar to how terms component returns term-tf (term frequency) map list.

Thank you.


Query to obtain count of term vocabulary

2017-10-09 Thread Reth RM
Dear Solr-User Group,

   Can you please suggest me API to query the *count* of total term
vocabulary in a given shard index for specified field? For example, in the
reference image click here
, count of
total terms in the "terms" column on the left hand side.

Thank you.


Distributed IDF configuration query

2017-09-29 Thread Reth RM
Dear Solr User Group,

I am trying to configure distributed idf(global df) to a collection
consisting of 3 shards.
Listed below are the configurations applied, however, the debug-explain
results still show the "idf" computed at shard index level. For example,
indexed total 7 docs,  3 among them had the term "art" and they were
distributed into 3 shards, expected docFreq=3 and docCount=7, but results
were as seen in the screenshot below. Incase if the image is not visible,
please follow the link : here


[image: Inline image 1]

*Configurations applied*

Added the following stats cache config to solrconfig.xml and reloaded
collection


Also executed the curl command:
curl http://localhost:8983/solr/test_news_byte_d_idf/config -H
'Content-type:application/json' -d '{ "set-user-property" :
{"solr.statsCache":"org.apache.solr.search.stats.ExactStatsCache "}}'

 Any pointers regarding what else need to be done as a part of
configuration appreciated.


Re: Solr on HDFS: Streaming API performance tuning

2016-12-16 Thread Reth RM
If you could provide the json parse exception stack trace, it might help to
predict issue there.


On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi 
wrote:

> Hi Joel,
>
> The only NON alpha-numeric characters I have in my data are '+' and '/'. I
> don't have any backslashes.
>
> If the special characters was the issue, I should get the JSON parsing
> exceptions every time irrespective of the index size and irrespective of
> the available memory on the machine. That is not the case here. The
> streaming API successfully returns all the documents when the index size is
> small and fits in the available memory. That's the reason I am confused.
>
> Thanks!
>
> On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein 
> wrote:
>
> > The Streaming API may have been throwing exceptions because the JSON
> > special characters were not escaped. This was fixed in Solr 6.0.
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi 
> > wrote:
> >
> > > Hello,
> > >
> > > I am running Solr 5.5.0.
> > > It is a solrCloud of 50 nodes and I have the following config for all
> the
> > > collections.
> > > maxShardsperNode: 1
> > > replicationFactor: 1
> > >
> > > I was using Streaming API to get back results from Solr. It worked fine
> > for
> > > a while until the index data size reached beyond 40 GB per shard (i.e.
> > per
> > > node). It started throwing JSON parsing exceptions while reading the
> > > TupleStream data. FYI: I have other services (Yarn, Spark) deployed on
> > the
> > > same boxes on which Solr shards are running. Spark jobs also use a lot
> of
> > > disk cache. So, the free available disk cache on the boxes vary a
> > > lot depending upon what else is running on the box.
> > >
> > > Due to this issue, I moved to using the cursor approach and it works
> fine
> > > but as we all know it is way slower than the streaming approach.
> > >
> > > Currently the index size per shard is 80GB (The machine has 512 GB of
> RAM
> > > and being used by different services/programs: heap/off-heap and the
> disk
> > > cache requirements).
> > >
> > > When I have enough RAM (more than 80 GB so that all the index data
> could
> > > fit in memory) available on the machine, the streaming API succeeds
> > without
> > > running into any exceptions.
> > >
> > > Question:
> > > How different the index data caching mechanism (for HDFS) is for the
> > > Streaming API from the cursorMark approach?
> > > Why cursor works every time but streaming works only when there is a
> lot
> > of
> > > free disk cache?
> > >
> > > Thank you.
> > >
> >
>


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
The primary difference has been solr to solr-cloud in later version,
starting from solr4.0  And what happens if you try starting solr in stand
alone mode, solr cloud does not consider 'core' anymore, it considers
'collection' as param.


On Thu, Dec 15, 2016 at 11:05 PM, Manan Sheth <manan.sh...@impetus.co.in>
wrote:

> Thanks Reth. As noted this is the same map reduce based indexer tool that
> comes shipped with the solr distribution by default.
>
> It only take the zk_host details and extracts all required information
> from there only. It does not have core specific configurations. The same
> tool released with solr 4.10 distro is working correctly, it seems to be
> some issue/ changes from solr 5 onwards. I have tested it for both solr 5.5
> & solr 6.2.1 and the behaviour remains same for both.
>
> Thanks,
> Manan Sheth
> ________
> From: Reth RM <reth.ik...@gmail.com>
> Sent: Friday, December 16, 2016 12:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.
>
> It looks like command line tool that you are using to initiate index
> process,  is expecting some name to solr-core with respective command line
> param. use -help on the command line tool that you are using and check the
> solr-core-name parameter key, pass that also with some value.
>
>
> On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth <manan.sh...@impetus.co.in>
> wrote:
>
> > Hi All,
> >
> >
> > While working on a migration project from Solr 4 to Solr 6, I need to
> > reindex my data using Solr map reduce Indexer tool in offline mode with
> > avro data.
> >
> > While executing the map reduce indexer tool shipped with solr 6.2.1, it
> is
> > throwing error of cannot create core with empty name value. The solr
> > instances are running fine with new indexed are being added and modified
> > correctly. Below is the command that was being fired:
> >
> >
> > hadoop --config /etc/hadoop/conf jar /home/impadmin/solr-6.2.1/
> dist/solr-map-reduce-*.jar
> > -D 'mapred.child.java.opts=-Xmx500m' \
> >-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> > --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
> >--zk-host 172.26.45.71:9984 --output-dir hdfs://
> > impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/output5 \
> >--collection app.quotes --log4j src/test/resources/log4j.
> properties
> > --verbose \
> >  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
> >
> >
> > Below is the complete snapshot of error trace:
> >
> >
> > Failed to initialize record writer for org.apache.solr.hadoop.
> > MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> > 00_0
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> > SolrRecordWriter.java:128)
> > at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> > SolrOutputFormat.java:163)
> > at org.apache.hadoop.mapred.ReduceTask$
> NewTrackingRecordWriter.
> > (ReduceTask.java:540)
> > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> > ReduceTask.java:614)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1709)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> > Caused by: org.apache.solr.common.SolrException: Cannot create core with
> > empty name value
> > at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> > CoreDescriptor.java:280)
> > at org.apache.solr.core.CoreDescriptor.(
> CoreDescriptor.java:191)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> > at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> > SolrRecordWriter.java:163)
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:121)
> > ... 9 more
> >
> > Additional points to note:
> >
> >
> >   *   The solrconfig and schema files are copied as is from Solr 4.
> >   *   Once collection is deployed, user can perform all operations on the
> > collec

Re: Solr - Amazon like search

2016-12-15 Thread Reth RM
There's a ecommerce features checklist with what solr can do listed here
https://lucidworks.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/

That should be good start and then there are some more other references
links listed below, I would try all of those features and check if there's
any other more specific feature that you are looking for and unable to find
how to incorporate that in solr, ask about it as new question.

link2

link3



On Wed, Dec 14, 2016 at 5:48 AM, Shawn Heisey  wrote:

> On 12/13/2016 10:55 PM, vasanth vijayaraj wrote:
> > We are building an e-commerce mobile app. I have implemented Solr search
> and autocomplete.
> > But we like the Amazon search and are trying to implement something like
> that. Attached a screenshot
> > of what has been implemented so far
> >
> > The search/suggest should sort list of products based on popularity,
> document hits and more.
> > How do we achieve this? Please help us out here.
>
> Your attachment didn't make it to the list.  They rarely do.  We can't
> see whatever it is you were trying to include.
>
> Sorting on things like popularity and hits requires putting that
> information into the index so that each document has fields that encode
> this information, allowing you to use Solr's standard sorting
> functionality with those fields.  You also need a process to update that
> information when there's a new hit.  It's possible, but you have to
> write this into your indexing system.
>
> Solr doesn't include special functionality for this.  It would be hard
> to generalize, and it can all be done without special functionality.
>
> Thanks,
> Shawn
>
>


Re: (Newbie Help!) Seeking guidance in regards to Solr's suggestor and others

2016-12-15 Thread Reth RM
This issue is on solarium-client php code, which is likely not traversing
further to pick results from collation tag of solr response.
at line 190
https://github.com/solariumphp/solarium/blob/master/library/Solarium/QueryType/Suggester/Result/Result.php#L190
verify if this is issue and do pull request, solarium contributors might
fix it.





On Mon, Dec 12, 2016 at 5:23 PM, KV Medmeme  wrote:

> Hi Friends,
>
> I'm new to solr, been working on it for the past 2-3 months trying to
> really get my feet wet with it so that I can transition the current search
> engine at my current job to solr. (Eww sphinx  haha) anyway I need some
> help. I was running around the net getting my suggester working and im
> stuck and I need some help. This is what I have so far. (I will explain
> after I posted links to the config files)
>
> here is a link to my managed-schema.xml
> http://pastebin.com/MiEWwESP
>
> solr config.xml
> http://pastebin.com/fq2yxbvp
>
> I am currently using Solr 6.2.1, my issue is..
>
> I am trying to build a suggester that builds search terms or phrases based
> off of the index that is in memory. I was playing around with the analyzers
> and the tokenizers as well as reading some very old books that touch base
> on solr 4. And I came up with this set of tokenizers and analyzer chain.
> Please correct it if its wrong. But my index contains Medical Abstracts
> published by Doctors and terms that I would really need to search for are
> "brain cancer" , "anti-inflammatory" , "hiv-1" kinda see where im going
> with? So i need to sorta preserve the white space and some sort of hyphen
> delimiter. After I discovered that, (now here comes the fun part)
>
> I type in the url:
>
> http://localhost:8983/solr/AbstractSuggest/suggest/?spellcheck.build=true
>
> then after when its built I query,
>
> http://localhost:8983/solr/AbstractSuggest/suggest/?
> spellcheck.q=suggest_field:%22anti-infl%22
>
> Which is perfectly fine It works great. I can see the collations so that In
> my dropdown search bar for when clients search these medical articles they
> can see these terms. Now In regards to PHP (solarium api to talk to solr)
> now. Since this is a website and I intend on making an AJAX call to php  I
> cannot see the collation list. Solarium fails on hyphenated terms as well
> as fails on building the collations list. For example if I would type in
>
> "brain canc" ( i want to search brain cancer)
>
> It auto suggests brain , then cancer but in collations nothing is shown. If
> I would to send this to the URL (localhost url, which will soon change when
> moved to prod enviornment) i can see the collations. A screenshot is here..
>
> brain can (url) -> https://gyazo.com/30a9d11e4b9b73b0768a12d342223dc3
>
> bran canc(solarium) -> https://gyazo.com/507b02e50d0e39d7daa96655dff83c76
> php code ->https://gyazo.com/1d2b8c90013784d7cde5301769cd230c
>
> So here is where I am. The ideal goal is to have the PHP api produce the
> same results just like the URL so when users type into a search bar they
> can see the collations.
>
>  Can someone please help? Im looking towards the community as the savior to
> all my problems. I want to learn about solr at the same time so if future
> problems popup I can solve them accordingly.
>
> Thanks!
> Happy Holidays
> Kevin.
>


Re: error diagnosis help.

2016-12-15 Thread Reth RM
Are you indexing xml files through nutch? This exception purely looks like
processing of in-correct format xml file.

On Mon, Dec 12, 2016 at 11:53 AM, KRIS MUSSHORN 
wrote:

> ive scoured my nutch and solr config files and I cant find any cause.
> suggestions?
> Monday, December 12, 2016 2:37:13 PMERROR   null
> RequestHandlerBase  org.apache.solr.common.SolrException: Unexpected
> character '&' (code 38) in epilog; expected '<'
> org.apache.solr.common.SolrException: Unexpected character '&' (code 38)
> in epilog; expected '<'
>  at [row,col {unknown-source}]: [1,36]
> at org.apache.solr.handler.loader.XMLLoader.load(
> XMLLoader.java:180)
> at org.apache.solr.handler.UpdateRequestHandler$1.load(
> UpdateRequestHandler.java:95)
> at org.apache.solr.handler.ContentStreamHandlerBase.
> handleRequestBody(ContentStreamHandlerBase.java:70)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:156)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
> at org.apache.solr.servlet.HttpSolrCall.execute(
> HttpSolrCall.java:658)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:457)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:223)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:181)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1652)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:585)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:577)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:223)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1127)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:515)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1061)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:215)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:110)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:310)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:257)
> at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:540)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:635)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
>


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
It looks like command line tool that you are using to initiate index
process,  is expecting some name to solr-core with respective command line
param. use -help on the command line tool that you are using and check the
solr-core-name parameter key, pass that also with some value.


On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth 
wrote:

> Hi All,
>
>
> While working on a migration project from Solr 4 to Solr 6, I need to
> reindex my data using Solr map reduce Indexer tool in offline mode with
> avro data.
>
> While executing the map reduce indexer tool shipped with solr 6.2.1, it is
> throwing error of cannot create core with empty name value. The solr
> instances are running fine with new indexed are being added and modified
> correctly. Below is the command that was being fired:
>
>
> hadoop --config /etc/hadoop/conf jar 
> /home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar
> -D 'mapred.child.java.opts=-Xmx500m' \
>-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
>--zk-host 172.26.45.71:9984 --output-dir hdfs://
> impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/output5 \
>--collection app.quotes --log4j src/test/resources/log4j.properties
> --verbose \
>  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
>
>
> Below is the complete snapshot of error trace:
>
>
> Failed to initialize record writer for org.apache.solr.hadoop.
> MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> 00_0
> at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:128)
> at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> SolrOutputFormat.java:163)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.
> (ReduceTask.java:540)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> ReduceTask.java:614)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.solr.common.SolrException: Cannot create core with
> empty name value
> at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> CoreDescriptor.java:280)
> at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> SolrRecordWriter.java:163)
> at 
> org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121)
> ... 9 more
>
> Additional points to note:
>
>
>   *   The solrconfig and schema files are copied as is from Solr 4.
>   *   Once collection is deployed, user can perform all operations on the
> collection without any issue.
>   *   The indexation process is working fine with the same tool on Solr 4.
>
> Please help.
>
>
> Thanks,
>
> Manan Sheth
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: Solr on HDFS: increase in query time with increase in data

2016-12-15 Thread Reth RM
I think the shard index size is huge and should be split.

On Wed, Dec 14, 2016 at 10:58 AM, Chetas Joshi 
wrote:

> Hi everyone,
>
> I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have
> the following config.
> maxShardsperNode: 1
> replicationFactor: 1
>
> I have been ingesting data into Solr for the last 3 months. With increase
> in data, I am observing increase in the query time. Currently the size of
> my indices is 70 GB per shard (i.e. per node).
>
> I am using cursor approach (/export handler) using SolrJ client to get back
> results from Solr. All the fields I am querying on and all the fields that
> I get back from Solr are indexed and have docValues enabled as well. What
> could be the reason behind increase in query time?
>
> Has this got something to do with the OS disk cache that is used for
> loading the Solr indices? When a query is fired, will Solr wait for all
> (70GB) of disk cache being available so that it can load the index file?
>
> Thnaks!
>


Re: Apply patch steps and update solr with new patch

2016-12-01 Thread Reth RM
Hi Erick,

Does  "ant server" executed under "/solr/" also compiles/includes
changes of patch made to lucene libraries?
This patch is adding/modifying lucene modules.



On Thu, Dec 1, 2016 at 5:02 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> You need to execute the target "ant package" from the solr directory
> (i.e. the sibling of the lucene directory) under where you cloned the
> Git repo.
>
> You should then find a distro just like you'd download from one of the
> mirrors, I don't quite remember where now, in build? dist?
>
> If you're testing locally, you don't have to build the package target,
> the 'ant server' target will compile the patch into /solr/
> wherever and you can start the changed code using the bin/solr script.
>
> P.S. I often build "ant server dist' to be sure I have the jar files
> for any SolrJ program I happen to be working with too...
>
> Best,
> Erick
>
> On Thu, Dec 1, 2016 at 1:52 PM, Reth RM <reth.ik...@gmail.com> wrote:
> > Hi,
> >
> > I followed the below steps to apply a patch, but have issues, any
> pointers
> > to mistake or blogs to apply patch and update solr with patch, will be
> > helpful.
> >
> > 1. git clone https://github.com/apache/lucene-solr.git
> > 2. ant clean, ant compile ant idea
> > 3. open project in idea(intellij)
> > 4. apply patch option in intellij
> > <https://www.jetbrains.com/help/idea/2016.2/applying-patches.html>,
> > downloaded and applied :
> > https://issues.apache.org/jira/secure/attachment/
> 12820564/LUCENE-2899.patch
> > 5. ant clean compiled again. (successful, in the intellij I can see the
> > files in patch 2899 added too)
> >
> > I am confused, after this point What should I copy from this trunk to use
> > solr as binary and use this patch functionality?
>


Apply patch steps and update solr with new patch

2016-12-01 Thread Reth RM
Hi,

I followed the below steps to apply a patch, but have issues, any pointers
to mistake or blogs to apply patch and update solr with patch, will be
helpful.

1. git clone https://github.com/apache/lucene-solr.git
2. ant clean, ant compile ant idea
3. open project in idea(intellij)
4. apply patch option in intellij
,
downloaded and applied :
https://issues.apache.org/jira/secure/attachment/12820564/LUCENE-2899.patch
5. ant clean compiled again. (successful, in the intellij I can see the
files in patch 2899 added too)

I am confused, after this point What should I copy from this trunk to use
solr as binary and use this patch functionality?


Re: Wildcard searches with space in TextField/StrField

2016-11-23 Thread Reth RM
what is the fieldType of those records?

On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi Erick,
> I gave this a try.
> These are my results. There is a record with "John D. Smith", and another
> named "John Doe".
>
> 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results.
>
>
>
> Second observation: There is a record with "John D Smith"
> 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record.
>
> 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record.
>
> SRK
>
> On Sunday, November 13, 2016 7:43 AM, Erick Erickson <
> erickerick...@gmail.com> wrote:
>
>
>  Right, for that kind of use case you want complexPhraseQueryParser,
> see: https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser
>
> Best,
> Erick
>
> On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
> <sandeep_khanz...@yahoo.com> wrote:
> > Thanks, Erick.
> >
> > I am actually not trying to use the String field (prefer a TextField
> here).
> > But, in my comparisons with TextField, it seems that something like
> phrase
> > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*',
> or
> > say, 'my dog has*') can only be accomplished with a string type field,
> > especially because, with a WhitespaceTokenizer in TextField, the space
> will
> > be lost, and all tokens will be individually considered. Am I missing
> > something?
> >
> > SRK
> >
> >
> > On Friday, November 11, 2016 10:05 PM, Erick Erickson
> > <erickerick...@gmail.com> wrote:
> >
> >
> > You have to query text and string fields differently, that's just the
> > way it works. The problem is getting the query string through the
> > parser as a _single_ token or as multiple tokens.
> >
> > Let's say you have a string field with the "a b" example. You have a
> > single token
> > a b that starts at offset 0.
> >
> > But with a text field, you have two tokens,
> > a at position 0
> > b at position 1
> >
> > But when the query parser sees "a b" (without quotes) it splits it
> > into two tokens, and only the text field has both tokens so the string
> > field won't match.
> >
> > OTOH, when the query parser sees "a\ b" it passes this through as a
> > single token, which only matches the string field as there's no
> > _single_ token "a b" in the text field.
> >
> > But a more interesting question is why you want to search this way.
> > String fields are intended for keywords, machine-generated IDs and the
> > like. They're pretty useless for searching anything except
> > 1> exact tokens
> > 2> prefixes
> >
> > While if you have "my dog has fleas" in a string field, you _can_
> > search "*dog*" and get a hit but the performance is poor when you get
> > a large corpus. Performance for "my*" will be pretty good though.
> >
> > In all this sounds like an XY problem, what's the use-case you're
> > trying to solve?
> >
> > Best,
> > Erick
> >
> >
> >
> > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
> > <sandeep_khanz...@yahoo.com.invalid> wrote:
> >> Hi Erick, Reth,
> >>
> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
> >> for StrField for me.
> >>
> >> Any attempt at creating a 'a\ b*' for a TextField does not match any
> >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am
> sure
> >> there are documents that should match.
> >> Another (maybe unrelated) observation is if I have 'field:a\ b', then
> the
> >> parsedQuery is field:a field:b. Which does not match as expected
> (matches
> >> individually).
> >>
> >> Can you please provide an example that I can use in Solr Query
> dashboard?
> >> That will be helpful.
> >>
> >> I have also seen that wildcard queries work irrespective of field type
> >> i.e. StrField as well as TextField. That makes sense because with a
> >> WhitespaceTokenizer only creates word boundaries when we do not use a
> >> EdgeNGramFilter. If I am not wrong, that is. SRK
>

Re: Editing schema and solrconfig files

2016-11-14 Thread Reth RM
There's a way to add/update/delete schema fields, this is helpful.
https://jpst.it/Pqqz
although no way to add field-Type

On Wed, Nov 9, 2016 at 2:20 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> We had the bright idea of allowing editing of the config files through
> the UI... but the ability to upload arbitrary XML is a security
> vulnerability, so that idea was nixed.
>
> The solr/bin script has an upconfig and downconfig command that are (I
> hope) easier to use than zkcli, I think from 5.5. In Solr 6.2 the
> solr/bin script has been enhanced to allow other ZK operations. Not
> quite what you were looking for, but I thought I'd mention it.
>
> There are some ZK clients out there that'll let you edit files
> directly in ZK, and I know IntelliJ also has a plugin that'll allow
> you to do that from the IDE, don't know about Eclipse but I expect it
> does.
>
> I usually edit them locally and set up a shell script to push them up
> as necessary...
>
> FWIW,
> Erick
>
> On Wed, Nov 9, 2016 at 2:09 PM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > I never found a way to do it through the UI... and ended up using "nano"
> on
> > linux for simple things.
> >
> > For more complex stuff, I scp'd the file (or the whole conf directory) up
> > to my dev box (a Mac in my case) and edited in a decent UI tool, then
> scp'd
> > the whole thing back...  I wrote a simple bash script to automate the scp
> > process on both ends once I got tired of typing it over and over...
> >
> > On Wed, Nov 9, 2016 at 3:05 PM, Reth RM <reth.ik...@gmail.com> wrote:
> >
> >> What are some easiest ways to edit/modify/add conf files, such as
> >> solrconfig.xml and schema.xml other than APIs end points or using zk
> >> commands to re-upload modified file?
> >>
> >> In other words, can we edit conf files through solr admin (GUI)
> >> interface(add new filed by click on button or add new request handler on
> >> click?)  with feature of enabling/disabling same feature as required?
> >>
>


Re: Wildcard searches with space in TextField/StrField

2016-11-10 Thread Reth RM
I don't think you can do wildcard on StrField. For text field, if your
query is "category:(test m*)"  the parsed query will be  "category:test OR
category:m*"
You can add q.op=AND to make an AND between those terms.

For phrase type wild card query support, as per docs, it
is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi,
> How does a search like abc* work in StrField. Since the entire thing is
> stored as a single token, is it a type of a trie structure that allows such
> wildcard matching?
> How can searches with space like 'a b*' be executed for text fields
> (tokenized on whitespace)? If we specify this type of query, it is broken
> down into two queries with field:a and field:b*. I would like them to be
> contiguous, sort of, like a phrase search with wild card.
> SRK


Re: SolrCloud Configuration

2016-11-10 Thread Reth RM
The easiest way is to create a /lib directory under each solr node and
place the custom jar in it. But I think it doesn't get distributed over
cluster, so this approach requires jar to be placed manually on each node.
IIRC, it was recommended that such custom jar to be on disc than uploaded
to zookeeper.
Lets wait for experts thoughts and suggestions.


On Thu, Nov 10, 2016 at 1:40 AM, Wunna Lwin  wrote:

> Hi,
>
> I just using solrCloud version 6.2.1 and everything is find but I would
> like to add some additional custom plugin into solrCloud.
>
> So, I upload my custom lib using Blob api and create requestHandler and
> components using config api into .system collection. But config api didn't
> load my custom lib when I use custom lib.
>
> Is there any specific documents to upload custom lib?
> I also want to know different between Blob api and Config api.
>
> Also the last question is I create .system collection and upload custom lib
> to that collection to share config files to all collections. Is this right
> way or should I upload explicit to main collection?
>
> Thanks
> Wunna
>


Editing schema and solrconfig files

2016-11-09 Thread Reth RM
What are some easiest ways to edit/modify/add conf files, such as
solrconfig.xml and schema.xml other than APIs end points or using zk
commands to re-upload modified file?

In other words, can we edit conf files through solr admin (GUI)
interface(add new filed by click on button or add new request handler on
click?)  with feature of enabling/disabling same feature as required?


Re: For TTL, does expirationFieldName need to be indexed?

2016-10-17 Thread Reth RM
Yes, I think the field has to be indexed. If I understand correctly,
DocExpirationUpdateProcessorFactory uses this field as query field, so it
should be indexed=true.


On Mon, Oct 17, 2016 at 11:35 AM, Brent  wrote:

> In my solrconfig.xml, I have:
>
>   
> 
>   30
>   expire_at
>   
>   
> 
>   
>
> and in my schema, I have:
>
>multiValued="false"/>
>
> If I change it to indexed="false", will it still work? If so, is there any
> benefit to having the field indexed if I'm not using it in any way except
> to
> allow the expiration processor to remove expired documents?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/For-TTL-does-expirationFieldName-need-to-
> be-indexed-tp4301522.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Sharding strategies

2016-10-10 Thread Reth RM
If you will have numerous documents, splitting documents into shard is a
strategy. This split is independent of lingo of document.

For documents with different languages, its necessary to use language
specific analyzers to obtain good search results. For example, assume you
have english language documents, its _text_ field should ideally be
text_en;  likewise, for Chinese/Japanese/Korean type documents, its fields'
fieldType should be text_cjk. If you mix documents of different language
type in same shard, then you will have to define multiple fieldTypes for
each language of document and also at query time manage, need to ensure to
query on respective fields.

There are different strategies that can be applied to have multilingual
search, slide 19 in this ppt explains them
http://www.slideshare.net/treygrainger/semantic-multilingual-strategies-in-lucenesolr
and
there's another article here  based on the assumption that we know the
language of the incoming document and the language in which the query could
be
https://support.lucidworks.com/hc/en-us/articles/203718886-How-to-implement-Multilingual-Search-using-Solr





On Mon, Oct 10, 2016 at 8:08 AM, Customer  wrote:

> Hi,
>
>
> I'm started working on the project which will likely have lots of
> documents in every single language and because of that I'm a bit worried
> storing everything into one single shard. What would be the best way for
> data store, any advices how I should split my data ? I was thinking about
> going for alphabet (make a shard for every single alphabet letter, but
> knowing fact that there will be lots of languages - not only English, this
> is not an option).
>
> Thank you for your advicesin advance.
>


Re: Distributing nodes with the collections API RESTORE command

2016-09-16 Thread Reth RM
Which version of solr? Afaik, until 6.1, solr backup and restore command
apis required to do separate backup for each shard, and then restore in
similar lines( both go for each). 6.1 version seems to have new feature of
backing up entire collection records and then restoring it back to new
collection setup(did not try yet).


On Thu, Sep 15, 2016 at 1:45 PM, Stephen Lewis  wrote:

> Hello,
>
> I have a solr cloud cluster in a test environment running 6.1 where I am
> looking at using the collections API BACKUP and RESTORE commands to manage
> data integrity.
>
> When restoring from a backup, I'm finding the same behavior occurs every
> time; after the restore command, all shards are being hosted on one node.
> What's especially surprising about this is that there are 6 live nodes
> beforehand, the collection has maxShardsPerNode set to 1, and this occurs
> even if I pass through the parameter maxShardsPerNode=1 to the API call. Is
> there perhaps somewhere else I need to configure something, or another step
> I am missing? If perhaps I'm misunderstanding the intention of these
> parameters, could you clarify for me and let me know how to support
> restoring different shards on different nodes?
>
> Full repro below.
>
> Thanks!
>
>
> *Repro*
>
> *Cluster state before*
>
> http://54.85.30.39:8983/solr/admin/collections?action=
> CLUSTERSTATUS=json
>
> {
>   "responseHeader" : {"status" : 0,"QTime" : 4},
>   "cluster" : {
> "collections" : {},
> "live_nodes" : [
>   "172.18.7.153:8983_solr",
>"172.18.2.20:8983_solr",
>"172.18.10.88:8983_solr",
>"172.18.6.224:8983_solr",
>"172.18.8.255:8983_solr",
>"172.18.2.21:8983_solr"]
>   }
> }
>
>
> *Restore Command (formatted for ease of reading)*
>
> http://54.85.30.39:8983/solr/admin/collections?action=RESTORE
>
> =panopto
> =backup-4
>
> =/mnt/beta_solr_backups
> =2016-09-02
>
> =1
>
> 
> 
> 0
> 16
> 
> backup-4
> 
>
>
> *Cluster state after*
>
> http://54.85.30.39:8983/solr/admin/collections?action=
> CLUSTERSTATUS=json
>
> {
>   "responseHeader" : {"status" : 0,"QTime" : 8},
>   "cluster" : {
> "collections" : {
>   "panopto" : {
> "replicationFactor" : "1",
> "shards" : {
>   "shard2" : {
> "range" : "0-7fff",
> "state" : "construction",
> "replicas" : {
>   "core_node1" : {
> "core" : "panopto_shard2_replica0",
> "base_url" : "http://172.18.2.21:8983/solr;,
> "node_name" : "172.18.2.21:8983_solr",
> "state" : "active",
> "leader" : "true"
>   }
> }
>   },
>   "shard1" : {
> "range" : "8000-",
> "state" : "construction",
> "replicas" : {
>   "core_node2" : {
> "core" : "panopto_shard1_replica0",
> "base_url" : "http://172.18.2.21:8983/solr;,
> "node_name" : "172.18.2.21:8983_solr",
> "state" : "active",
> "leader" : "true"
>   }
> }
>   }
> },
> "router" : {
>   "name" : "compositeId"
> },
> "maxShardsPerNode" : "1",
> "autoAddReplicas" : "false",
> "znodeVersion" : 44,
> "configName" : "panopto"
>   }
> },
> "live_nodes" : ["172.18.7.153:8983_solr", "172.18.2.20:8983_solr",
> "172.18.10.88:8983_solr", "172.18.6.224:8983_solr", "172.18.8.255:8983
> _solr",
> "172.18.2.21:8983_solr"]
>   }
> }
>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>


Re: Exception is thrown when using TimestampUpdateProcessorFactory

2016-09-16 Thread Reth RM
Hi Preeti,

Try adding a default attribute to the solrtimestamp field in schema and
check if this resolves the issue.

replace  with correct default date format
https://cwiki.apache.org/confluence/display/solr/Defining+Fields


On Thu, Sep 15, 2016 at 5:32 AM, preeti kumari 
wrote:

> Hi All,
>
> I am trying to get solr index time as solrtimestamp field.
>
>
>  omitNorms="true"/>
>
> I am using solr 5.2.1 in solr cloud mode.
>
>
>  
>  
>solrtimestamp
>   
>   
> update-script.js
> 
>   example config parameter
> 
>  xnum,xnum2
>   
> 
> 
>  
>
> But I am getting below exception when i run update or through DIH. Please
> let me know how to fix this.
>
> java.lang.NullPointerException
> at
> org.apache.solr.update.processor.TimestampUpdateProcessorFactor
> y$1.getDefaultValue(TimestampUpdateProcessorFactory.java:66)
> at
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProc
> essorFactory$DefaultValueUpdateProcessor.processAdd(
> AbstractDefaultValueUpdateProcessorFactory.java:91)
> at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
> at
> org.apache.solr.handler.dataimport.DataImportHandler$
> 1.upload(DataImportHandler.java:259)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:524)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:414)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:
> 329)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.
> java:416)
> at
> org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:480)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(
> DataImporter.java:461)
>


Re: Migrate data from solr4.9 to solr6.1

2016-08-30 Thread Reth RM
>>Is there any way through which I can migrate my index which is currently
on
4.9 to 6.1?
you should try copying existing indexes to latest solr 6.x and execute
optimize command. Let us know your findings.

>>I would be using solrcloud on solr 6.1.0 and will be having more number of
shards than my previous set-up.

You can use shard splitting feature if the above step creates indexes
without any issues.



On Mon, Aug 29, 2016 at 11:09 AM, Piyush Kunal 
wrote:

> I would be using solrcloud on solr 6.1.0 and will be having more number of
> shards than my previous set-up.
>
> On Mon, Aug 29, 2016 at 11:38 PM, Piyush Kunal 
> wrote:
>
> > Is there any way through which I can migrate my index which is currently
> > on 4.9 to 6.1?
> >
> > Looking for something backup and restore.
> >
>


Re: language configuration in update extract request handler

2016-06-06 Thread Reth RM
This question should be posted on tika mailing list. It is not related to
index or search but about parsing content of image.

On Sun, Jun 5, 2016 at 10:20 PM, SIDDHAST® Roshan 
wrote:

> Hi All,
>
> we are using the application for indexing and searching text using
> solr. we refered the guide posted
>
> http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/
>
> Problem: we are want to index hindi images. we want to know how to set
> configuration parameter of tesseract via tika or external params
>
> --
> Roshan Agarwal
> Siddhast®
> 907 chandra vihar colony
> Jhansi-284002
> M:+917376314900
>


Re: Indexing a (File attached to a document)

2016-05-12 Thread Reth RM
Could you please let us know which crawler are you using to fetch data from
document and its attachment?


On Thu, May 12, 2016 at 3:26 PM, Solr User  wrote:

> Hi
>
> If I index a document with a file attachment attached to it in solr, can I
> visualise data of that attached file attachment also while querying that
> particular document? Please help me on this
>
>
> Thanks & Regards
> Vidya Nadella
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Multi-word Synonyms Solr 4.3.1 does not work

2016-05-06 Thread Reth RM
Right, this is a known issue. There is currently an active jira that you
may like to watch https://issues.apache.org/jira/browse/SOLR-5379

And other possible workaround is explained here :
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

On Fri, May 6, 2016 at 11:51 AM, SRINI SOLR  wrote:

> Hi All -
> Can you please help me out on the multi-word synonyms with Solr 4.3.1.
>
> I am using the synonyms as below 
>
> test1,test2 => movie1 cinema,movie2 cinema,movie3 cinema
>
> I am able to success with the above syntax  like - if I search for
> words like test1 or test2  then right hand side multi-word values are
> shown.
>
> But -
>
> I have a synonyms like below - multi-word on both the side left-hand and
> right-hand...
>
> test1 test, test2 test, test3 test =>movie1 cinema,movie2 cinema,movie3
> cinema
>
> With the above left-hand multi-word format - not working as expected 
> means 
>
> Here below is the configuration I am using on query analyzer ...
>
>  ignoreCase="true"  expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
>
>
> Please Help me 
>


Re: Error - Too many close [count:-1]

2016-04-30 Thread Reth RM
Could you please some more background to this issue. Was it reported while
indexing or querying? What is the version of solr?


On Sat, Apr 30, 2016 at 12:04 AM, Vipul Gupta  wrote:

> Solr team - Any pointers on fixing this issue ?
>
> [10:29:08] ERROR 0-thread-7 o.a.s.c.SolrCore <> Too many close [count:-1]
> on
> org.apache.solr.core.SolrCore@3d6f8ad3. Please report this exception to
> solr-user@lucene.apache.org
>


Re: ANN: Solr puzzle: Magic Date

2016-04-27 Thread Reth RM
Yes, these can be practice/interview questions. But, considering the
specific example above, it seems like question is pertaining to plot
syntactically error(?);  it is not expected that developer/solr-user know
right syntax or commands. What could be interesting is, questions related
to cloud concepts, ranking concepts (tf-idf, bm25) or simple problem
statements that may ask how can this be implemented using solr's ootb
features/apis, and so on. If such are the upcoming puzzles question, I'm
sure they will be useful.
I liked the idea.


On Tue, Apr 26, 2016 at 5:49 PM, Alexandre Rafalovitch 
wrote:

> I am doing an experiment in teaching about Solr. I've created a Solr
> puzzle and want to know whether people would find it useful to do more
> of these. My mailing list have seen this already, but I would love the
> feedback from a wider Solr audience as well. Privately or on the list.
>
> The - first - puzzle is deceptively simple:
>
> --
> Given the following sequence of commands (for Solr 5.5 or 6.0):
>
> 1. bin/solr create_core -c puzzle_date
> 2. bin/post -c puzzle_date -type text/csv -d $'today\n2016-04-08'
> 3. curl http://localhost:8983/solr/puzzle_date/select?q=Fri
>
> --
> Would the result be:
>
> 1.Error in the command 1 for not providing a configuration directory
> 2.Error in the command 2 for missing a uniqueKey field
> 3.Error in the command 2 due to an incorrect date format
> 4.No records in the command 3 output
> 5.One record in the command 3 output
> --
>
> You can find the answer and full in-depth explanation at:
> http://blog.outerthoughts.com/2016/04/solr-5-puzzle-magic-date-answer/
>
> Again, what I am trying to understand is whether that's somehow useful
> to people and worth making time to create and write-up.
>
> Any feedback would be appreciated.
>
> Regards,
> Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>


Re: Build Java Package for required schema and solrconfig files field and configuration.

2016-04-27 Thread Reth RM
Hi Nitin,

If I understand correctly, you have configured suggest component in solr
instance. Solr instance is an independent java program and it will be
running on its own when you start and stop. You cannot package solr/suggest
component in your java application/project.

You can use SolrJ apis in your Java project and make use of those apis to
query solr to obtain suggestions :
http://www.solrtutorial.com/solrj-tutorial.html




On Wed, Apr 27, 2016 at 10:50 AM, Nitin Solanki 
wrote:

> Hello Everyone,
>  I have created a autosuggest using Solr suggester.
> I have added a field and field type in schema.xml and did some changes in
> /suggest request handler into solrconfig.xml.
> Now, I need to build a java package using those configuration which I need
> to plug into my current java project. I don't want to use CURL, I need my
> configuration as jar or java package. How can I do ? Not having experience
> of jar package too much. Any help please...
>
> Thanks,
> Nitin
>


Re: Solr Cloud Indexing Performance degrades suddenly

2016-04-26 Thread Reth RM
What are the recent changes made to database or DIH? Version upgrade?
Addition of new fields? co-location of db?


On Tue, Apr 26, 2016 at 2:47 PM, preeti kumari 
wrote:

> I am using solr 5.2.1 .
>
>
> -- Forwarded message --
> From: preeti kumari 
> Date: Mon, Apr 25, 2016 at 2:29 PM
> Subject: Solr Cloud Indexing Performance degrades suddenly
> To: solr-user@lucene.apache.org
>
>
> Hi,
>
> I have 2 solr cloud setups : Primary and secondary.
> Both are importing data from same DB . I am using DIH to index data.
>
> I was previously getting speed of 700docs/sec .
> Now suddenly primary cluster is giving me a speed of 20docs/sec.
> Same configs in Secondary is still giving 700 docs/sec speed.
> Both cluster servers are having same server specifications.
>
>
> I am looking for pointers where i can look for the reason for this degrade
> in indexing speed.
>
> Please help me out.
>
> Thanks
> Preeti
>


Re: concat 2 fields

2016-04-26 Thread Reth RM
Check if you have added the 'concatFields'  definition as well in
solrconfig.xml...
How are you indexing btw?


On Tue, Apr 26, 2016 at 12:24 PM, vrajesh  wrote:

> Hi,
> i have added it to /update request handler as per following in
> solrconfig.xml:
>  
> 
>  application/json
>   concatFields
>
>   
>   
> 
>  application/csv
>   concatFields
>
>   
>
> but when i query it after indexing new files, i dont see any concatenated
> field.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/concat-2-fields-tp4271760p4272829.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: regarding filter on spell checker

2016-04-25 Thread Reth RM
Could you please the requirement with an example? Its not clear as to what
you mean by index property.

On Tue, Apr 26, 2016 at 8:54 AM, Adrita G  wrote:

> Hi
>
>I want to whether we can apply any filters on spell checker.My
> requirement is like that I need to filter the suggestions based on one
> indexed property.I am trying to trying to filter queries,but the result is
> not getting filtered.Please help on this.
>
>
> Thaks & regards
> Adrita Goswami
> Tata Consultancy Services
> MARG Square No 16,
> Rajiv Gandhi Salai(IT Express Way),
> Karapakkam,
> Chennai - 600097,Tamilnadu
> India
> Ph:-   919874168068
> Mailto: adrit...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty.   IT Services
> Business Solutions
> Consulting
> 
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>


Re: The Streaming API (Solrj.io) : id must have DocValues?

2016-04-25 Thread Reth RM
Hi,

So, is the concern related to same field value being stored twice: with
stored=true and docValues=true? If that is the case, there is a jira
relevant to this, fixed[1]. If you upgrade to 5.5/6.0 version, it is
possible to read non-stored fields from docValues index., check out.


[1] https://issues.apache.org/jira/browse/SOLR-8220

On Mon, Apr 25, 2016 at 9:44 AM, sudsport s  wrote:

> Thanks Erik for reply,
>
> Since I was storing Id (its stored field) and after enabling docValues my
> guess is it will be stored in 2 places. also as per my understanding
> docValues are great when you have values which repeat. I am not sure how
> beneficial it would be for uniqueId field.
> I am looking at collection of few hundred billion documents , that is
> reason I really want to care about expense from design phase.
>
>
>
>
> On Sun, Apr 24, 2016 at 7:24 PM, Erick Erickson 
> wrote:
>
> > In a word, "yes".
> >
> > DocValues aren't particularly expensive, or expensive at all. The idea
> > is that when you sort by a field or facet, the field has to be
> > "uninverted" which builds the entire structure in Java's JVM (this is
> > when the field is _not_ DocValues).
> >
> > DocValues essentially serialize this structure to disk. So your
> > on-disk index size is larger, but that size is MMaped rather than
> > stored on Java's heap.
> >
> > Really, the question I'd have to ask though is "why do you care about
> > the expense?". If you have a functional requirement that has to be
> > served by returning the id via the /export handler, you really have no
> > choice.
> >
> > Best,
> > Erick
> >
> >
> > On Sun, Apr 24, 2016 at 9:55 AM, sudsport s 
> wrote:
> > > I was trying to use Streaming for reading basic tuple stream. I am
> using
> > > sort by id asc ,
> > > I am getting following exception
> > >
> > > I am using export search handler as per
> > > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> > >
> > > null:java.io.IOException: id must have DocValues to use this feature.
> > > at
> >
> org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:241)
> > > at
> >
> org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120)
> > > at
> >
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53)
> > > at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:742)
> > > at
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:471)
> > > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
> > > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> > > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> > > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> > > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> > > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> > > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> > > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> > > at org.eclipse.jetty.server.session.SessionHandler.doScope(
> > >
> > >
> > > does it make sense to enable docValues for unique field? How expensive
> > is it?
> > >
> > >
> > > if I have existing collection can I update schema and optimize
> > > collection to get docvalues enabled for id?
> > >
> > >
> > > --
> > >
> > > Thanks
> >
>


Re: concat 2 fields

2016-04-25 Thread Reth RM
It should be added to /update request handler. All the others that you have
listed here are search request handlers, you should add this one to /update
RH.

On Mon, Apr 25, 2016 at 12:12 PM, vrajesh  wrote:

> in my solr config there are many requestHandler so i am confused in which
> requestHandler i should add it. i have some requestHandlers with names
> "/select", "/export","/query","/browse" and much more.
> i want to use this new processor chain for all type of file formats.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/concat-2-fields-tp4271760p4272564.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: concat 2 fields

2016-04-22 Thread Reth RM
Have you added this new processor chain to update handler that you are
using(as shown below)?
 myChain

https://wiki.apache.org/solr/UpdateRequestProcessor#Selecting_the_UpdateChain_for_Your_Request



On Thu, Apr 21, 2016 at 2:59 PM, vrajesh  wrote:

> to concatenating two fields to use it as one field from
>
> http://grokbase.com/t/lucene/solr-user/138vr75hvj/concat-2-fields-in-another-field
> ,
> but the solution whichever is given i tried but its not working. please
> help
> me on it.
>  i am trying to concat latitude and longitude fields to make it as single
> unit using following:
>  
>
> 
>
>  
>  i added it to solrconfig.xml.
>
>  some of my doubts are :
>  - should we define destination field (geo_location) in schema.xml?
>
>  - i want to make this combined field  (geo_location) as field facet so i
> have to add   in
>
>  - any specific tag in which i should add above process script to make it
> working.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/concat-2-fields-tp4271760.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Max Query length

2016-04-22 Thread Reth RM
I'm not sure, may be this should work :
QueryResponse response = solr.query(q, METHOD.POST);

Let's wait for others response.




On Fri, Apr 22, 2016 at 8:51 PM, Kelly, Frank <frank.ke...@here.com> wrote:

> I am using the SolrJ library - does it have a way to specify one variant
> (POST) over the other (GET)?
>
> -Frank
>
>
>
>
> On 4/22/16, 11:13 AM, "Reth RM" <reth.ik...@gmail.com> wrote:
>
> >Are you using get instead of post?
> >
> >https://dzone.com/articles/solr-select-query-get-vs-post
> >
> >
> >
> >On Fri, Apr 22, 2016 at 8:12 PM, Kelly, Frank <frank.ke...@here.com>
> >wrote:
> >
> >> I used SolrJ and wrote a test to confirm that the max query length
> >> supported by Solr (by default) was 8192 in Solr 5.3.1
> >> Based on the default Jetty settings
> >>
> >> jetty.xml: >> name="solr.jetty.request.header.size" default="8192" />
> >>
> >>
> >> The test would not work however until I had used a max size of 4096 (so
> >> the query passes at 4095 and returns a RemoteSolrException at 4097).
> >>
> >>
> >> Is there another setting somewhere limiting the max query length?
> >>
> >>
> >> -Frank
> >>
> >> *Frank Kelly*
> >>
> >> Principal Software Engineer
> >>
> >> Predictive Analytics Team (SCBE/HAC/CDA)
> >>
> >>
> >> *HERE *
> >>
> >> 5 Wayside Rd, Burlington, MA 01803, USA
> >>
> >> *42° 29' 7" N 71° 11' 32² W*
> >>
> >>
> >> <http://360.here.com/>   <https://twitter.com/here>
> >> <https://www.facebook.com/here>
> >><https://linkedin.com/company/heremaps>
> >>   <https://www.instagram.com/here>
> >>
> >>
>
>


Re: Solr Max Query length

2016-04-22 Thread Reth RM
Are you using get instead of post?

https://dzone.com/articles/solr-select-query-get-vs-post



On Fri, Apr 22, 2016 at 8:12 PM, Kelly, Frank  wrote:

> I used SolrJ and wrote a test to confirm that the max query length
> supported by Solr (by default) was 8192 in Solr 5.3.1
> Based on the default Jetty settings
>
> jetty.xml: name="solr.jetty.request.header.size" default="8192" />
>
>
> The test would not work however until I had used a max size of 4096 (so
> the query passes at 4095 and returns a RemoteSolrException at 4097).
>
>
> Is there another setting somewhere limiting the max query length?
>
>
> -Frank
>
> *Frank Kelly*
>
> Principal Software Engineer
>
> Predictive Analytics Team (SCBE/HAC/CDA)
>
>
> *HERE *
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32” W*
>
>
>    
> 
>   
>
>


Re: Wildcard query behavior.

2016-04-18 Thread Reth RM
If you search for f:validat*, then I believe you will get same number of
results. Please check.

f:validator* is searching for records that have prefix "validator" where as
field with stemmer which stems "validator" to "validate" (if this stemming
was applied at index time as well as query time) its looking for records
that have "validate" or "validator", so for obvious reasons, numFound might
have been different.



On Mon, Apr 18, 2016 at 12:48 PM, Modassar Ather 
wrote:

> Hi,
>
> Please help me understand following.
>
> I have analysis chain which uses KStemFilterFactory for a field. Solr
> version is 5.4.0
>
> When I search for f:validator I get 80K+ documents whereas if I search for
> f:validator* I get only around 150 results.
>
> When I checked on analysis page I see that validator is changed to
> validate. Per my understanding in both the above cases it should at-least
> give the exact same result of around 80K+ documents.
>
> I understand in some cases wildcards can result in sub-optimal results for
> stemmed content. Please correct me if I am wrong.
>
> Thanks,
> Modassar
>


Re: dataimport db-data-config.xml

2016-04-17 Thread Reth RM
 What are the errors reported?  Errors can be either seen on admin page
logging tab or log file under solr_home.
If you follow the steps mentioned on the blog precisely, it should almost
work
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/

If you encounter errors at any step, lets us know.




On Sat, Apr 16, 2016 at 10:49 AM, kishor  wrote:

> I am try to run two pgsql query on same data-source. is this possible in
> db-data-config.xml.
>
>
> 
>
>  url="jdbc:postgresql://0.0.0.0:5432/iboats"
> user="iboats"
> password="root" />
>
> 
>  transformer="TemplateTransformer">
>
>  template="user1-${user1.id}"/>
>
>
> This code is not working please suggest any more example
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270673.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Reth RM
output of command :

org/apache/solr/client/solrj/io/sql/
   META-INF/services/java.sql.Driver
  org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
 org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
   org/apache/solr/client/solrj/io/sql/DriverImpl.class
 org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
   org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
  org/apache/solr/client/solrj/io/sql/StatementImpl.class
   org/apache/solr/client/solrj/io/sql/package-info.class



On Fri, Apr 15, 2016 at 9:01 PM, Kevin Risden 
wrote:

> >
> > Page 11, the screenshot specifies to select a
> > "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> > "solr-solrj-6.0.0.jar" shipped with released version, correct?
> >
>
> Correct the PDF was generated before 6.0.0 was released. The documentation
> from SOLR-8521 is being migrated to here:
>
>
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools
>
>
> > When I try adding that jar, it doesn't show up driver class, DBVisualizer
> > still shows "No new driver class". Does it mean the class is not added to
> > this jar yet?
> >
>
> I checked the Solr 6.0.0 release and the driver is there. I was testing it
> yesterday for a blog series that I'm putting together.
>
> Just for reference here is the output for the Solr 6 release:
>
> tar -tvf solr-solrj-6.0.0.jar | grep sql
> drwxrwxrwx  0 0  0   0 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/
> -rwxrwxrwx  0 0  0 842 Apr  1 14:40
> META-INF/services/java.sql.Driver
> -rwxrwxrwx  0 0  0   10124 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
> -rwxrwxrwx  0 0  0   23557 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
> -rwxrwxrwx  0 0  04459 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/DriverImpl.class
> -rwxrwxrwx  0 0  0   28333 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
> -rwxrwxrwx  0 0  05167 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
> -rwxrwxrwx  0 0  0   10451 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/StatementImpl.class
> -rwxrwxrwx  0 0  0 141 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/package-info.class
>
>
> Kevin Risden
> Apache Lucene/Solr Committer
> Hadoop and Search Tech Lead | Avalon Consulting, LLC
> 
> M: 732 213 8417
> LinkedIn  | Google+
>  | Twitter
> 
>
>
> -
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message. Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
>


Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Reth RM
Note: I followed the steps mentioned in the pdf attached on this Jira
https://issues.apache.org/jira/browse/SOLR-8521

Page 11, the screenshot specifies to select a
"solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
"solr-solrj-6.0.0.jar" shipped with released version, correct?

When I try adding that jar, it doesn't show up driver class, DBVisualizer
still shows "No new driver class". Does it mean the class is not added to
this jar yet?


Re: Cache problem

2016-04-12 Thread Reth RM
This has answers about why giving enough memory to OS is important:
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
And as per solr admin dashboard, the os cache (physical memory is almost
utilized where as memory allocated to jvm is not used) so its best to lower
jvm memory.
Why set xms=xmx? this link pretty much answers it:
http://stackoverflow.com/questions/16087153/what-happens-when-we-set-xmx-and-xms-equal-size



On Tue, Apr 12, 2016 at 3:05 PM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Thank you both, Bill and Reth!
>
> Here is my current options from my command to launch java:
> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
> -XX:MaxPermSize=20480m [...]*
>
> So should I do *-Xms20480m -Xmx20480m* ?
> Why? What would it change?
>
> Reminder: the size of my main index is 46Gb... (80Gb all together)
>
>
>
> BTW: what's the difference between dark and light grey in the JVM
> representation? (real/virtual memory?)
>
>
> NOTE: I have only tomcat running on this server (and this is my live
> website - *i.e.: quite critical*).
>
> So if document cache is using the OS cache, this might be the problem,
> right?
> (because it seems to cache every field ==> so all the data returned by the
> query)
>
> kr,
> Bast
>
>
> On 12/04/2016 08:19, Reth RM wrote:
>
> As per solr admin dashboard's memory report, solr jvm is not using memory
> more than 20 gb, where as physical memory is almost full.  I'd set
> xms=xmx=16 gb and let operating system use rest. And regarding caches:
>  filter cache hit ratio looks good so it should not be concern. And afaik,
> document cache actually uses OS cache. Overall, I'd reduce memory allocated
> to jvm as said above and try.
>
>
>
>
> On Mon, Apr 11, 2016 at 7:40 PM, <billnb...@gmail.com> <billnb...@gmail.com> 
> wrote:
>
>
> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
>
> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>
> <lat...@mdpi.com.INVALID> <lat...@mdpi.com.INVALID> wrote:
>
> Dear Solr experts :),
>
> I read this very interesting post 'Understanding and tuning your Solr
>
> caches' !
>
> This is the only good document that I was able to find after searching
>
> for 1 day!
>
> I was using Solr for 2 years without knowing in details what it was
>
> caching...(because I did not need to understand it before).
>
> I had to take a look since I needed to restart (regularly) my tomcat in
>
> order to improve performances...
>
> But I now have 2 questions:
> 1) How can I know how much RAM is my solr using in real (especially for
>
> caching)?
>
> 2) Could you have a quick look into the following images and tell me if
>
> I'm doing something wrong?
>
> Note: my index contains 66 millions of articles with several text fields
>
> stored.
>
> 
>
> My solr contains several cores (all together are ~80Gb big), but almost
>
> only the one below is used.
>
> I have the feeling that a lot of data is always stored in RAM...and
>
> getting bigger and bigger all the time...
>
> 
> 
>
> (after restart)
> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> 
> [...] after a few minutes
> 
>
> Here are some images, that can show you some stats about my Solr
>
> performances...
>
> 
> 
> 
>
> 
>
> Kind regards,
> Bastien Latard
>
>
>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>


Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-12 Thread Reth RM
Can you please be bit more specific on what type of query are you making
and what other values are you expecting, with example?

If you know of specific jira for the use case, then you can write comments
there.


On Mon, Apr 11, 2016 at 5:54 PM, Anton K.  wrote:

> Anyone?
>
> Or how can i contact with facet heatmaps creator?
>
> 2016-04-07 18:42 GMT+03:00 Anton K. :
>
> > I am working with new solr feature: facet heatmaps. It works great, i
> > create clusters on my map with counts. When user click on cluster i zoom
> in
> > that area and i might show him more clusters or documents (based on
> current
> > zoom level).
> >
> > But all my cluster icons (i use round one, see screenshot below) placed
> > straight in the center of cluster's rectangles:
> >
> > https://dl.dropboxusercontent.com/u/1999619/images/map_grid3.png
> >
> > Some clusters can be in sea and so on. Also it feels not natural in my
> > case to have icons placed orderly on the world map.
> >
> > I want to place cluster's icons in average coords based on coordinates of
> > all my docs inside cluster. Is there any way to achieve this? I am trying
> > to use stats component for facet heatmap but it isn't implemented yet.
> >
>


Re: Cache problem

2016-04-12 Thread Reth RM
As per solr admin dashboard's memory report, solr jvm is not using memory
more than 20 gb, where as physical memory is almost full.  I'd set
xms=xmx=16 gb and let operating system use rest. And regarding caches:
 filter cache hit ratio looks good so it should not be concern. And afaik,
document cache actually uses OS cache. Overall, I'd reduce memory allocated
to jvm as said above and try.




On Mon, Apr 11, 2016 at 7:40 PM,  wrote:

> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
> > On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>  wrote:
> >
> > Dear Solr experts :),
> >
> > I read this very interesting post 'Understanding and tuning your Solr
> caches' !
> > This is the only good document that I was able to find after searching
> for 1 day!
> >
> > I was using Solr for 2 years without knowing in details what it was
> caching...(because I did not need to understand it before).
> > I had to take a look since I needed to restart (regularly) my tomcat in
> order to improve performances...
> >
> > But I now have 2 questions:
> > 1) How can I know how much RAM is my solr using in real (especially for
> caching)?
> > 2) Could you have a quick look into the following images and tell me if
> I'm doing something wrong?
> >
> > Note: my index contains 66 millions of articles with several text fields
> stored.
> > 
> >
> > My solr contains several cores (all together are ~80Gb big), but almost
> only the one below is used.
> >
> > I have the feeling that a lot of data is always stored in RAM...and
> getting bigger and bigger all the time...
> >
> > 
> > 
> >
> > (after restart)
> > $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> > 
> > [...] after a few minutes
> > 
> >
> > Here are some images, that can show you some stats about my Solr
> performances...
> > 
> > 
> > 
> >
> > 
> >
> > Kind regards,
> > Bastien Latard
> >
> >
>


Re: Solrj API for Managed Resources

2016-04-12 Thread Reth RM
I think its best to use available APIs. Here are the list of apis for
managing synonyms and stop words

https://cwiki.apache.org/confluence/display/solr/Managed+Resources

And this blog post with details
https://lucidworks.com/blog/2014/03/31/introducing-solrs-restmanager-and-managed-stop-words-and-synonyms/



On Tue, Apr 12, 2016 at 4:39 AM, iambest  wrote:

> Is there a solrj API to add synonyms or stop words using the Managed
> Resources API? I have to programmatically add them, what is the best way?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solrj-API-for-Managed-Resources-tp4269454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Specify relative path to current core conf folder when it's originally relative to solr home

2016-04-12 Thread Reth RM
 I think there are some root paths defined in solr.sh file that will be in
bin directory. You can pick root directory variable from there and use it.
Example in solrconfig.xml, there is a value as :
" ${solr.install.dir:../../../..}" I think solr.install.dir is the root
path and its definition is set in solr.sh. I'm not sure but worth giving a
try.




On Tue, Apr 12, 2016 at 9:34 AM, scott.chu  wrote:

> I got a custom tokenizer. When configuring it, there's an attribute
> 'fileDir', whose value is  a path relative to solr home. But I wish it can
> be relative to current core. Is there some system variable out-of-box, say
> {current_core}, that I can use in the value? For example,
>
> solr home = /solr5/server/solr
> In the current core's solrconfig.xml, I can specify
> 
> 
> myfiledir
> 
> 
>
> so it will refer to /solr5/server/solr/myfiledir.
>
> But I wanna put myfileDir under current core's conf folder. I wish there's
> something such as:
> ...
> {current_core}/conf/myfiledir
> ...
>
> Is it possible?
>


Re: search design question

2016-04-06 Thread Reth RM
Why not copy the field values of category, title, features, spec into a
common text field and then search on that field. Otherwise use a edismax
query parser and search with user search string on all the above fields may
be by boosting title, category and specs field in order to get relevant
results.
Could you please why do you need to form a query by recognizing the exact
field for each query term?



On Wed, Apr 6, 2016 at 3:07 PM, Binoy Dalal  wrote:

> I understand.
> Although I am not exactly sure how to solve this one, this should serve as
> a helpful starting point:
>
> https://lucidworks.com/resources/webinars/natural-language-search-with-solr/
>
> On Wed, 6 Apr 2016, 11:27 Midas A,  wrote:
>
> > thanks Binoy for replying ,
> >
> > i am giving you few use cases
> >
> > a)  shoes in nike  or nike shoes
> >
> > Here "nike " is brand and in this case  my query  entity is shoe and
> entity
> > type is brand
> >
> > and my result should only pink nike shoes
> >
> >
> > b)  " 32 inch  LCD TV  sony "
> >
> > 32 inch is size ,  LCD is entity type and sony is brand
> >
> >
> > in this case my solr query should be build in different manner to get
> > accurate results .
> >
> >
> >
> >
> > Probably, now u can understand my problem.
> >
> >
> > On Wed, Apr 6, 2016 at 11:12 AM, Binoy Dalal 
> > wrote:
> >
> > > Could you describe your problem in more detail with examples of your
> use
> > > cases.
> > >
> > > On Wed, 6 Apr 2016, 11:03 Midas A,  wrote:
> > >
> > > >  i have to do entity and entity type mapping with help of search
> query
> > > > while building solr query.
> > > >
> > > > how i should i design with the solr  for search.
> > > >
> > > > Please guide me .
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: SolrCloud backup/restore

2016-04-05 Thread Reth RM
Yes. It should be backing up each shard leader of collection. For each
collection, for each shard, find the leader and request a backup command on
that. Further, restore this on new collection, in its respective shard and
then go on adding new replica which will duly pull it from the newly added
shards.


On Mon, Apr 4, 2016 at 10:32 PM, Zisis Tachtsidis 
wrote:

> I've tested backup/restore successfully in a SolrCloud installation with a
> single node (no replicas). This has been achieved in
> https://issues.apache.org/jira/browse/SOLR-6637
> Can you do something similar when more replicas are involved? What I'm
> looking for is a restore command that will restore index in all replicas of
> a collection.
> Judging from the code in /ReplicationHandler.java/ and
> https://issues.apache.org/jira/browse/SOLR-5750 I assume that more work
> needs to be done to achieve this.
>
> Is my understanding correct? If the situation is like this I guess an
> alternative would be to just create a new collection, restore index and
> then
> add replicas. (I'm using Solr 5.5.0)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-backup-restore-tp4267954.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-05 Thread Reth RM
Have you already looked at cluster status api?
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18


On Tue, Apr 5, 2016 at 10:09 AM, preeti kumari 
wrote:

> Hi,
>
> I am using solr 5.2.1 . We need to configure F5 load balancer with
> zookeepers.
> For that we need to know whether our cluster as a whole is eligible to
> serve queries or not. We can get cluster state using ping request handler
> but in solr 5.2.1 with distrib=true it gives exception(known bug in solr
> 5.2.1). So now I need :
>
> 1. Any way to get cluster state as a whole to see if cluster can serve
> queries without going to individual solr nodes.
> 2. If we can anyhow get this info from zookeepers
> 3. can we make ping request handler with distrib=true work in solr 5.2.1
>
> Any info in this regard would be appreciated where i don't want to go to
> individual solr nodes.
>
> Thanks
> Preeti
>


Re: How to implement Autosuggestion

2016-04-03 Thread Reth RM
There is a payload attribute but I'm not sure if this can be used for such
use case. Lets wait for others contributors to confirm.
Similar question posted here:
http://stackoverflow.com/questions/32434186/solr-suggestion-with-multiple-payloads
.

If its just a category that you need then the work around(although not
accurate one) that I can think of is to include the category value to the
same field with pipe separation and extract from it?

On Sun, Apr 3, 2016 at 11:41 AM, chandan khatri 
wrote:

> Hi All,
>
> I've a query regarding autosuggestion. My use case is as below:
>
> 1. User enters product name (say Nokia)
> 2. I want suggestions along with the category with which the product
> belongs. (e.g Nokia belongs to "electronics" and "mobile" category) so I
> want suggestion like Nokia in electronics and Nokia in mobile.
>
> I am able to get the suggestions using the OOTB AnalyzingInFixSuggester but
> not sure how I can get the category along with the suggestion(can this
> category be considered as facet of the suggestion??)
>
> Any help/pointer is highly appreciated.
>
> Thanks,
> Chandan
>


Re: most popular collate spellcheck

2016-04-03 Thread Reth RM
May be open a jira under improvement.
https://issues.apache.org/jira/login.jsp?


On Sat, Apr 2, 2016 at 11:30 PM, michael solomon <micheal...@gmail.com>
wrote:

> Thanks, and what we can do about that?
> On Apr 2, 2016 5:28 PM, "Reth RM" <reth.ik...@gmail.com> wrote:
>
> > Afaik, such feature doesn't exist currently, but looks like nice to have.
> >
> >
> >
> >
> > On Thu, Mar 31, 2016 at 8:33 PM, michael solomon <micheal...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > It's possible to return the most popular collate?
> > > i.e:
> > > spellcheck.q = prditive analytiycs
> > > spellcheck.maxCollations = 5
> > > spellcheck.count=0
> > > response:
> > > 
> > >   
> > >   false
> > >   
> > > positive analytic
> > > positive analytics
> > > predictive analytics
> > > primitive analytics
> > > punitive analytic
> > >   
> > > 
> > >
> > > I want that the collations will order by numFound. and obviesly that
> > > "predictive analytics" have more results from "positive analytic".
> > > Thanks,
> > > Michael
> > >
> >
>


Re: most popular collate spellcheck

2016-04-02 Thread Reth RM
Afaik, such feature doesn't exist currently, but looks like nice to have.




On Thu, Mar 31, 2016 at 8:33 PM, michael solomon 
wrote:

> Hi,
> It's possible to return the most popular collate?
> i.e:
> spellcheck.q = prditive analytiycs
> spellcheck.maxCollations = 5
> spellcheck.count=0
> response:
> 
>   
>   false
>   
> positive analytic
> positive analytics
> predictive analytics
> primitive analytics
> punitive analytic
>   
> 
>
> I want that the collations will order by numFound. and obviesly that
> "predictive analytics" have more results from "positive analytic".
> Thanks,
> Michael
>


Re: How to implement Autosuggestion

2016-03-28 Thread Reth RM
Solr AnalyzingInfix suggester component:
https://lucidworks.com/blog/2015/03/04/solr-suggester/



On Mon, Mar 28, 2016 at 7:57 PM, Mugeesh Husain  wrote:

> Hi,
>
> I am looking for the best way to implement autosuggestion in ecommerce
> using solr or elasticsearch.
>
> I guess using ngram analyzer is not a good way if data is big.
>
>
> Please suggest me any link or your opinion ?
>
>
>
> Thanks
> Mugeesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-implement-Autosuggestion-tp4266434.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Problem in Issuing a Command to Upload Configuration

2016-03-28 Thread Reth RM
I think it should be "zkcli.bat" (all in lower case) that is shipped with
solr not zkCli.cmd(that is shipped with zookeeper)

solr_home/server/scripts/cloud-scripts/zkcli.bat -zkhost 127.0.0.1:9983 \
   -cmd upconfig -confname my_new_config -confdir
server/solr/configsets/basic_configs/conf

On Mon, Mar 28, 2016 at 8:18 PM, Salman Ansari 
wrote:

> Hi,
>
> I am facing issue uploading configuration to Zookeeper ensemble. I am
> running this on Windows as
>
> *Command*
> **
> zkCli.cmd -cmd upconfig -zkhost
> "[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
> [config_name]  -confdir "[config_dir]"
>
> and I got the following result
>
> *Result*
> =
> Connecting to localhost:2181
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:java.version=1.8.0_77
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:java.vendor=Oracle Corporation
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:java.home=C:\Program Files\Java\jre1.8.0_77
> 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> environm
>
> ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
>
> lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
>
> 6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
>
> 0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
>
> \Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
>
> okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
>
> in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
>
> ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
>
> n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
>
> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
> ll\v1.0\;C:\Program Files\Java\JDK\bin;.
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:java.compiler=
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:os.name=Windows Server 2012 R2
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:os.arch=amd64
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:os.version=6.3
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:user.name=admin_user
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:user.home=C:\Users\admin_user
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> environm
> ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
> 2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] - Initiating
> client
>  connection, connectString=localhost:2181 sessionTimeout=3
> watcher=org.apach
> e.zookeeper.ZooKeeperMain$MyWatcher@506c589e
>
> It looks like that it is not even calling the command. Any idea why is that
> happening?
>
> Regards,
> Salman
>


Re: scottchu] How to rebuild master-slave multi-core with schema.xml from old verison in Solr 5.5

2016-03-28 Thread Reth RM
Hi Scott,

It is same as how we would do in earlier versions of solr.

On the master instance, include the replication handler definition with
master configs(as shown below).
  optimize optimize 
schema.xml,stopwords.txt,elevate.xml  

And on the slave instance, add the master url under slave config:


 
http://localhost:8983/solr/techproducts/replication 00:00:20  

Documentation is here
https://cwiki.apache.org/confluence/display/solr/Index+Replication



On Mon, Mar 28, 2016 at 8:19 AM, scott.chu  wrote:

>
> I post a question "How to rebuild master-slave multi-core with schema.xml
> from old verison in Solr 5.5" on stackoverflow. Hoping some expericnes solr
> people can reply me with a suggestive answer. The url is:
> http://stackoverflow.com/questions/36254855/how-to-rebuild-master-slave-multi-core-with-schema-xml-from-old-verison-in-solr
>
> scott.chu,scott@udngroup.com
> 2016/3/28 (週一)
>


Re: score mixing

2016-03-27 Thread Reth RM
If you are looking for boosting the score of the document based on the
value of rank field then you can as well use field boosting.
rank^10. For the other case of adding scores and rank values, using
"function query" should serve the requirement.

https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

On Sun, Mar 27, 2016 at 2:27 PM, michael solomon 
wrote:

> Hi,
> I have nested documents and use in BlockJoinQueryParser.
> In parent documents I have "rank" field that give an arbitrary score for
> each parent.
> It's possible to mix the original scoring with mine? i.e:
> SolrScore+rank=finel score
> or(proportional scoring..)SolrScore/MaxScore + rank/MaxRank = finel
> score(between 0-1)
> Thanks,
> Michael
>


Re: Solr to Production

2016-03-27 Thread Reth RM
Is that website deployed on same machine where solr is running? If not,
check whether the port is being blocked due to firewall protection.  What
is the response message that you are receiving?



On Sun, Mar 27, 2016 at 3:16 PM, Adel Mohamed Khalifa <
a.moha...@saudisoft.com> wrote:

> Hello All,
>
>
>
> I installed solr server on my Ubuntu and  when I use it directly it  runs
> good, but when I use it remotely using my website it doesnot run and I
> don't
> know what the reason, can you  help me please.
>
>
>
> Regards,
> Adel Khalifa
>
>
>
>


Re: Issue With Manual Lock

2016-03-24 Thread Reth RM
Hi Salman,

The index lock error is generally reported when 2 cores are trying to share
an index directory between more than one core or Solr instance. Please
check if there are more than one of those cores pointing to same data
directory. You can see dir path on "overview" tab admin page.




On Wed, Mar 23, 2016 at 1:59 PM, Salman Ansari 
wrote:

> Hi,
>
> I am facing an issue which I believe has something to do with recent
> changes in Solr. I already have a collection spread on 2 shards (each with
> 2 replicas).  What happened is that my Solr and Zookeeper ensemble went
> down and I restarted the servers. I have performed the following steps
>
> 1) I restarted the machine and performed Windows update
> 2) I started Zookeeper ensemble
> 3) Then I started Solr instances
>
> My issues are (for collections which existed before starting Solr servers)
>
> 1) From time to time, I see some replicas are down on Solr dashboard
> 2) When I try to index some documents, I faced the following exception
>
> SolrNet.Exceptions.SolrConnectionException was unhandled by user code
>
>   HResult=-2146232832
>
>   Message=
>
> 
> 500 name="QTime">1021 name="msg">{msg=SolrCore '[myCollection]_shard1_replica1' is not available
> due to init failure: Index locked for write for core
> '[myCollection]_shard1_replica1'. Solr now longer supports forceful
> unlocking via 'unlockOnStartup'. Please verify locks
> manually!,trace=org.apache.solr.common.SolrException: SolrCore
> '[myCollection]_shard1_replica1' is not available due to init failure:
> Index locked for write for core '[myCollection]_shard1_replica1'. Solr now
> longer supports forceful unlocking via 'unlockOnStartup'. Please verify
> locks manually!??
>
> I have tried several steps including
>
> 1) I have removed write.lock file manually from the folders while Solr is
> up and I have tried reloading the core while the Solr is up as well but
> nothing changed (still some replicas are down)
> 2) I have restarted Solr instances but now all replicas are down :)
>
> Any idea how to handle this issue?
>
> Appreciate your comments/feedback.
>
> Regards,
> Salman
>


Re: Merge two Solr documents into One

2016-03-23 Thread Reth RM
As far as I know, there are no such ootb available utils but I can be wrong
and lets wait for others thoughts as well.
Other ways of dealing with this requirement is to write a custom update
processor :
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors

On Wed, Mar 23, 2016 at 5:43 PM, solr2020  wrote:

> Hi,
>
> I have 2-3 Solr documents but i would like to merge all these into one
> document while indexing. Something like parent-child. So do we have any
> utils for Solr to merge two or more SolrInputDocument into one
> SolrInputDocument.
>
> Thanks,
> Gomathi.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Merge-two-Solr-documents-into-One-tp4265528.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing multiple pdf's and partial update of pdf

2016-03-23 Thread Reth RM
Are you using apache tika parser to parse pdf files?

1) Solr support parent-child block join using which you can index more than
one file data within document object(if that is what you are looking for)
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

2) If the unique key of the document that exists in index is equal to new
document that you are reindexing, it will be overwritten. If you'd like to
do partial updates via curl, here are some examples listed :
http://yonik.com/solr/atomic-updates/





On Thu, Mar 24, 2016 at 3:43 AM, Jay Parashar  wrote:

> Hi,
>
> I have couple of questions regarding indexing files (say pdf).
>
> 1)  Is there any way to index more than one file to one document with
> a unique id?
>
> One way I think is to do a “extractOnly” of all the documents and then
> index that extract separately. Is there an easier way?
>
> 2)  If my Solr document has existing fields populated and then I index
> a pdf, it seems it overwrites the document with the end result being just
> the contents of the pdf. I know we can do partial updates using SolrJ but
> is it possible to do partial updates of pdf using curl?
>
>
> Thanks
> Jay
>