Re: indexing rich data from directory using solarium

2015-12-02 Thread Gora Mohanty
On 2 December 2015 at 21:55, kostali hassan  wrote:
> yes they are a Error in my solr logs:
> SolrException URLDecoder: Invalid character encoding detected after
> position 79 of query string / form data (while parsing as UTF-8)
> 
> this is my post in stack overflow :
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79

Looks like an encoding error all right. Are you very sure that you can
sucessfully POST the same document with SimplePostTool. If so, I would
guess that you are not using Solarium correctly, i.e., the PDF file is
getting POSTed such that Solr is getting the raw content rather than
the extracted content.

Regards,
Gora


Re: indexing rich data from directory using solarium

2015-12-02 Thread Gora Mohanty
On 2 December 2015 at 21:59, Erik Hatcher  wrote:
> Gora -
>
> SimplePostTool actually already adds the literal.id parameter* when in “auto” 
> mode (and it’s not an XML, JSON, or CSV file).

Ah, OK. It has been a while since I actually used the tool. Thanks for the info.

Regards,
Gora


Re: Block Joins

2015-12-02 Thread Rick Leir
Hi Mikhail
Sorry, I should have noted "that" is a word in the OCR text that I have
indexed.

What do I want to achieve? The Block Joins we have been discussing were
giving me scores of 0.0, and I would like to get something a wee bit better
than that (not looking for accuracy yet).

In the query below, I got some scores by putting {!score in the parent
clause, but got no change when putting {!score in the child clause. What is
happening here? I will enable "InfoStream" and post the results.
Thanks -- Rick

On Tue, Dec 1, 2015 at 5:21 PM, 
wrote:

>
> fl=score,[child parentFilter=type_s:book childFilter=*{!score=avg}*that],
>
> This childFilter value doesn't make sense. What do you want to achieve?
>
> On Tue, Dec 1, 2015 at 7:28 PM, Rick Leir 
> wrote:
>
> > Hi all,
> > Scoring is confusing me. Is the following correct?
> >
> > $ curl http://localhost:8983/solr/dorsetdata/query -d '
> > q={!parent which="content_type:parentDocument" *score=max*} type_s:page
> AND
> > that&
> > wt=json=true&
> > fl=score,[child parentFilter=type_s:book childFilter=*{!score=avg}*that
> > ],canonicalMaster,title,publisher,[docid]'


migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-02 Thread Mugeesh Husain
Hello,

I have a 2 solr instance, one is running in solr(non cloud),another one is
solrcloud mode.

data is indexed in solr mode(non -cloud),now i have creates/define same core
with same schema in solrcloud instance.

I want to transfer/copy data from one core(non-cloud) to my solrcloud core.

On which way i have to do it.

Note: Index  size is 4.97 GB only.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/migrate-or-copy-data-from-one-core1-node2-to-anothere-core2-node1-tp4243159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: highlight

2015-12-02 Thread Rick Leir
For performance, if you have many large documents, you want to index the
whole document but only store some identifiers. (Maybe this is not a
consideration for you, stop reading now )

If you are not storing the whole document, then Solr cannot do the
highlighting.  You would get an id, then locate your source document (maybe
in your filesystem) and do highlighting yourself.

> Can anyone offer any solutions for searching large documents and
returning a
> single phrase highlight?


Method to fix issue when you get KeeperErrorCode = NoAuth when Zookeeper ACL enabled

2015-12-02 Thread Jeff Wu
We have being following this wiki to enable ZooKeeper ACL control
https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control#ZooKeeperAccessControl-AboutZooKeeperACLs

It works fine for Solr service itself, but when you try to
use scripts/cloud-scripts/zkcli.sh to put a zNode, it throws such exception:

   Exception in thread "main"
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth for /security.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)

To fix the problem, the wiki needs to be updated(however I can not put a
comment in that wiki):

SOLR_ZK_CREDS_AND_ACLS="*-DzkACLProvider=org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider
-DzkCredentialsProvider=org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider*
-DzkDigestUsername=admin-user -DzkDigestPassword=admin-password
-DzkDigestReadonlyUsername=readonly-user
-DzkDigestReadonlyPassword=readonly-password"


I think the reason is zkcli.sh is a standard JVM process and does not read
settings from solr.xml, so we must explicitly provide these parameters in
order to make ZK ACL work.

Could someone help to notify the wiki editor to update this? Right now this
wiki will lead people to a dead end with zkcli.sh to put content to ZK
ensemble with ACL enabled


Re: Block Joins

2015-12-02 Thread Mikhail Khludnev
Rick,
would you mind to put exact query params, response and let know the
expectation? Otherwise, it's hard to get the problem.

On Wed, Dec 2, 2015 at 5:44 PM, Rick Leir  wrote:

> Hi Mikhail
> Sorry, I should have noted "that" is a word in the OCR text that I have
> indexed.
>
> What do I want to achieve? The Block Joins we have been discussing were
> giving me scores of 0.0, and I would like to get something a wee bit better
> than that (not looking for accuracy yet).
>
> In the query below, I got some scores by putting {!score in the parent
> clause, but got no change when putting {!score in the child clause. What is
> happening here? I will enable "InfoStream" and post the results.
> Thanks -- Rick
>
> On Tue, Dec 1, 2015 at 5:21 PM, 
> wrote:
>
> >
> > fl=score,[child parentFilter=type_s:book childFilter=*{!score=avg}*that],
> >
> > This childFilter value doesn't make sense. What do you want to achieve?
> >
> > On Tue, Dec 1, 2015 at 7:28 PM, Rick Leir 
> > wrote:
> >
> > > Hi all,
> > > Scoring is confusing me. Is the following correct?
> > >
> > > $ curl http://localhost:8983/solr/dorsetdata/query -d '
> > > q={!parent which="content_type:parentDocument" *score=max*} type_s:page
> > AND
> > > that&
> > > wt=json=true&
> > > fl=score,[child parentFilter=type_s:book childFilter=*{!score=avg}*that
> > > ],canonicalMaster,title,publisher,[docid]'
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: highlight

2015-12-02 Thread Teague James
Hello,

Thanks for replying! Yes, I am storing the whole document. The document is 
indexed with a unique id. There are only 3 fields in the schema - id, 
rawDocument, tikaDocument. Search uses the tikaDocument field. Against this I 
am throwing 2-5 word phrases and getting highlighting matches to each 
individual word in the phrases instead of just the phrase. The highlighted text 
that is matched is read by another application for display in the front end UI. 
Right now my app has logic to figure out that multiple highlights indicate a 
phrase, but it isn't perfect. 

In this case Solr is reporting a single 3 word phrase as 2 hits one with 2 of 
the phrase words, the other with 1 of the phrase words. This only happens in 
large documents where the multi word phrase appears across the boundary of one 
of the document fragments that Solr in analyzing (this is a hunch - I really 
don't know the mechanics for certain, but the next statement makes evident how 
I came to this conclusion). However if I make a one sentence document with the 
same multi word phrase, Solr will report 1 hit with all three words 
individually highlighted. At the very least I know Solr is getting the phrase 
correct. It is the method of highlighting (I'm trying to get one set of tags 
per phrase) and the occasional breaking of a single phrase into 2 hits.

Given that setup, what do you recommend? I'm not sure I understand the approach 
you're describing. I appreciate the help!

-Teague James

> On Dec 2, 2015, at 10:09 AM, Rick Leir  wrote:
> 
> For performance, if you have many large documents, you want to index the
> whole document but only store some identifiers. (Maybe this is not a
> consideration for you, stop reading now )
> 
> If you are not storing the whole document, then Solr cannot do the
> highlighting.  You would get an id, then locate your source document (maybe
> in your filesystem) and do highlighting yourself.
> 
>> Can anyone offer any solutions for searching large documents and
> returning a
>> single phrase highlight?


Re: indexing rich data from directory using solarium

2015-12-02 Thread Erik Hatcher
Gora - 

SimplePostTool actually already adds the literal.id parameter* when in “auto” 
mode (and it’s not an XML, JSON, or CSV file).

Erik


* See 
https://github.com/apache/lucene-solr/blob/d4762c1a2677a44c8a580b97239e1e91a25d/solr/core/src/java/org/apache/solr/util/SimplePostTool.java#L786
 



—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Dec 2, 2015, at 11:18 AM, Gora Mohanty  wrote:
> 
> On 2 December 2015 at 17:16, kostali hassan  wrote:
>> yes its logic Thank you , but i want understand why the same data is
>> indexing fine in shell using windows SimplePostTool :
>>> 
>>> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
>>> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
>>> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
> 
> That seems strange. Are you sure that you are posting the same PDF.
> With SimplePostTool, you should be POSTing to the URL
> /solr/update/extract?literal.id=myid , i.e., you need an option of
> something like:
> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
> command line for SimplePostTool.
> 
> Likewise, I am not that familiar with Solarium. Are you sure that the
> file is being POSTed to /solr/update/extract . Are you seeing any
> errors in your Solr logs?
> 
> Regards,
> Gora



SOLR-2798 (local params parsing issue) -- how can I help?

2015-12-02 Thread Demian Katz
Hello,

I'd really love to see a resolution to SOLR-2798, since my application has a 
bug that cannot be addressed until this issue is fixed.

It occurred to me that there's a good chance that the code involved in this 
issue is relatively isolated and testable, so I might be able to help with a 
solution even though I have no prior experience with the Solr code base. I'm 
just wondering if anyone can confirm this and, if so, point me in the right 
general direction so that I can make an attempt at a patch.

I asked about this a while ago in a comment on the JIRA ticket, but I have a 
feeling that nobody actually saw that - so I'm trying again here on the mailing 
list.

Any and all help greatly appreciated - and hopefully if you help me a little, I 
can contribute a useful fix back to the project in return.

thanks,
Demian


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
the prob with posting using line commande is :

I start working in solr 5.3.1 by extract solr in D://solr and run solr
server with :

D:\solr\solr-5.3.1\bin>solr start ;

Then I create a core in standalone mode :

D:\solr\solr-5.3.1\bin>solr create -c mycore

I need indexing from system files (word and pdf) and the schema API don’t
have a field “name” of document, then I Add this field using curl :

curl -X POST -H 'Content-type:application/json' --data-binary '{

  "add-field":{

 "name":"name",

 "type":"text_general",

 "stored":true,

 “indexed”:true }

}' http://localhost:8983/solr/mycore/schema



And re-index all document.with windows SimplepostTools:

D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
-Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
D:\Lucene\document ;



But even if the field “name” is succeffly added he is empty ; the field
title get the name for only pdf document not for msword(.doc and .docx).



Then I choose indexing with techproducts example because he don’t use
schema.xml API then I can modified my schema:



D:\solr\solr-5.3.1>solr –e techproducts



Techproducts return the name of all files.xml indexed;



Then I create a new core based in solr_home example/techproducts/solr and I
use schema.xml (contient field “name”) and solrConfig.xml from techproducts
in this new core called demo.

When I indexed all document the field name exist but still empty for all
document indexed.



My question is how I can get just the name of each document(msword and pdf)
not the path like the field “id” or field “ressource_name” ; I have to
create new Typefield or exist another way.

2015-12-02 16:25 GMT+00:00 kostali hassan :

> yes they are a Error in my solr logs:
> SolrException URLDecoder: Invalid character encoding detected after
> position 79 of query string / form data (while parsing as UTF-8)
> 
> this is my post in stack overflow :
>
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>
> 2015-12-02 16:18 GMT+00:00 Gora Mohanty :
>
>> On 2 December 2015 at 17:16, kostali hassan 
>> wrote:
>> > yes its logic Thank you , but i want understand why the same data is
>> > indexing fine in shell using windows SimplePostTool :
>> >>
>> >> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar
>> -Dauto=yes
>> >> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
>> >> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
>>
>> That seems strange. Are you sure that you are posting the same PDF.
>> With SimplePostTool, you should be POSTing to the URL
>> /solr/update/extract?literal.id=myid , i.e., you need an option of
>> something like:
>> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
>> command line for SimplePostTool.
>>
>> Likewise, I am not that familiar with Solarium. Are you sure that the
>> file is being POSTed to /solr/update/extract . Are you seeing any
>> errors in your Solr logs?
>>
>> Regards,
>> Gora
>>
>
>


Re: migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-02 Thread Erick Erickson
Just shut down the SolrCloud instance and copy the index from the
non-cloud to cloud directory. Then bring the cloud instance up. You
should then be fine. This assumes that your SolrCloud instance has
only one shard, which is what I expect given your index size.

After that's done, and assuming you want to add replicas in the
SolrCloud version for HA/DR/Performance reasons, use the ADDREPLICA
Collections API command.

Please try to think about "collections" rather than "cores" when
working with SolrCloud, it'll save grief later on ;)... When you say
you created the core in SolrCloud it worries me. You should use the
Collections API to create a _collection_, NOT fiddling with creating
directories and the like and most certainly not using the _Core_ admin
API. It will be a single-shard collection...

Best,
Erick



On Wed, Dec 2, 2015 at 6:51 AM, Mugeesh Husain  wrote:
> Hello,
>
> I have a 2 solr instance, one is running in solr(non cloud),another one is
> solrcloud mode.
>
> data is indexed in solr mode(non -cloud),now i have creates/define same core
> with same schema in solrcloud instance.
>
> I want to transfer/copy data from one core(non-cloud) to my solrcloud core.
>
> On which way i have to do it.
>
> Note: Index  size is 4.97 GB only.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/migrate-or-copy-data-from-one-core1-node2-to-anothere-core2-node1-tp4243159.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes they are a Error in my solr logs:
SolrException URLDecoder: Invalid character encoding detected after
position 79 of query string / form data (while parsing as UTF-8)

this is my post in stack overflow :
http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79

2015-12-02 16:18 GMT+00:00 Gora Mohanty :

> On 2 December 2015 at 17:16, kostali hassan 
> wrote:
> > yes its logic Thank you , but i want understand why the same data is
> > indexing fine in shell using windows SimplePostTool :
> >>
> >> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar
> -Dauto=yes
> >> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
> >> org.apache.solr.util.SimplePostTool D:\Lucene\document ;
>
> That seems strange. Are you sure that you are posting the same PDF.
> With SimplePostTool, you should be POSTing to the URL
> /solr/update/extract?literal.id=myid , i.e., you need an option of
> something like:
> -Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
> command line for SimplePostTool.
>
> Likewise, I am not that familiar with Solarium. Are you sure that the
> file is being POSTed to /solr/update/extract . Are you seeing any
> errors in your Solr logs?
>
> Regards,
> Gora
>


Re: indexing rich data from directory using solarium

2015-12-02 Thread Gora Mohanty
On 2 December 2015 at 17:16, kostali hassan  wrote:
> yes its logic Thank you , but i want understand why the same data is
> indexing fine in shell using windows SimplePostTool :
>>
>> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
>> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
>> org.apache.solr.util.SimplePostTool D:\Lucene\document ;

That seems strange. Are you sure that you are posting the same PDF.
With SimplePostTool, you should be POSTing to the URL
/solr/update/extract?literal.id=myid , i.e., you need an option of
something like:
-Durl=http://localhost:8983/solr/update/extract?literal.id=myid in the
command line for SimplePostTool.

Likewise, I am not that familiar with Solarium. Are you sure that the
file is being POSTed to /solr/update/extract . Are you seeing any
errors in your Solr logs?

Regards,
Gora


Re: migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-02 Thread Mugeesh Husain
Thanks Erick,

I am making join operation for multiple core in solrcloud mode.

>>After that's done, and assuming you want to add replicas in the SolrCloud
version for HA/DR/Performance reasons, use the ADDREPLICA  Collections API
command. 

If i split core into shard  then there is any way to use join in multiple
shards.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/migrate-Copy-data-from-one-core1-server1-to-another-core2-server2-tp4243159p4243210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
i fixed but he still a smal prb from time out 30sc of wamp server then i
can just put 130files to a directory to index untill i index all my files :
this is my function idex document:

*App::import('Vendor','autoload',array('file'=>'solarium/vendor/autoload.php'));*

*public function indexDocument(){*
*$config = array(*
* "endpoint" => array("localhost" => array("host"=>"127.0.0.1",*
* "port"=>"8983", "path"=>"/solr", "core"=>"demo",)*
*) );*
*   $start = microtime(true);*

*if($_POST){*
*// create a client instance*
*$client = new Solarium\Client($config);*
*$dossier=$this->request->data['User']['dossier'];*
*$dir = new Folder($dossier);*
*$files = $dir->find('.*\.*');*

* $headers = array('Content-Type:multipart/form-data');*

*foreach ($files as $file) {*
*$file = new File($dir->pwd() . DS . $file);*

*$query = $client->createExtract();*
*$query->setFile($file->pwd());*
*$query->setCommit(true);*
*$query->setOmitHeader(false);*

*$doc = $query->createDocument();*
*$doc->id =$file->pwd();*
*$doc->name = $file->name;*
*$doc->title = $file->name();*

*$query->setDocument($doc);*

*$request = $client->createRequest($query);*
*$request->addHeaders($headers);*

*$result = $client->executeRequest($request);*
*}*

*}*

*$this->set(compact('start'));*
*}*


2015-12-02 16:42 GMT+00:00 kostali hassan :

> yes I am sure because i successeflly Post the same document(455 .doc .docx
> and pdf in 18 second) with SimplePostTool
> But now i want to commincate directly with my server solr using solarium
> in my application cakephp ; I think only way to have the right encoding is
> in header :
> *$headers = array('Content-Type:multipart/form-data');*
> * I guess it will *working if the time of indexing is not depassing 30
> second from time out of wamp server.
>
> 2015-12-02 16:32 GMT+00:00 Gora Mohanty :
>
>> On 2 December 2015 at 21:55, kostali hassan 
>> wrote:
>> > yes they are a Error in my solr logs:
>> > SolrException URLDecoder: Invalid character encoding detected after
>> > position 79 of query string / form data (while parsing as UTF-8)
>> > <
>> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>> >
>> > this is my post in stack overflow :
>> >
>> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>>
>> Looks like an encoding error all right. Are you very sure that you can
>> sucessfully POST the same document with SimplePostTool. If so, I would
>> guess that you are not using Solarium correctly, i.e., the PDF file is
>> getting POSTed such that Solr is getting the raw content rather than
>> the extracted content.
>>
>> Regards,
>> Gora
>>
>
>


Re: indexing rich data from directory using solarium

2015-12-02 Thread Gora Mohanty
On 2 December 2015 at 22:35, kostali hassan  wrote:
> i fixed but he still a smal prb from time out 30sc of wamp server then i
> can just put 130files to a directory to index untill i index all my files :
> this is my function idex document:

Again, not familiar with Solarium, and at this point you are probably
better off asking on a Solarium-specific list, but my guess is that
you need keepalive on the connection. It seems that Solarium's
ZendHttpServer does this:
http://wiki.solarium-project.org/index.php/V1:Client_adapters .

Regards,
Gora


Solr extract performance

2015-12-02 Thread kostali hassan
I look for optimal way to extract and commit rich data from directory
contient many file system masword and pdf because I have a prb with
30second of time out in wamp server.
 this is my function index document in cakephp using solarium:

*App::import('Vendor','autoload',array('file'=>'solarium/vendor/autoload.php'));*

*public function indexDocument(){*
*$config = array(*
* "endpoint" => array("localhost" => array("host"=>"127.0.0.1",*
* "port"=>"8983", "path"=>"/solr", "core"=>"demo",)*
*) );*
*   $start = microtime(true);*

*if($_POST){*
*// create a client instance*
*$client = new Solarium\Client($config);*
*$dossier=$this->request->data['User']['dossier'];*
*$dir = new Folder($dossier);*
*$files = $dir->find('.*\.*');*

* $headers = array('Content-Type:multipart/form-data');*

*foreach ($files as $file) {*
*$file = new File($dir->pwd() . DS . $file);*

*$query = $client->createExtract();*
*$query->setFile($file->pwd());*
*$query->setCommit(true);*
*$query->setOmitHeader(false);*

*$doc = $query->createDocument();*
*$doc->id =$file->pwd();*
*$doc->name = $file->name;*
*$doc->title = $file->name();*

*$query->setDocument($doc);*

*$request = $client->createRequest($query);*
*$request->addHeaders($headers);*

*$result = $client->executeRequest($request);*
*}*

*}*

*$this->set(compact('start'));*
*}*


Searching and sorting using field aliasing

2015-12-02 Thread Mahmoud Almokadem
Hi all, 

I have two cores (core1, core2). core1 contains fields(f1, f2, f3, date1) and 
core2 contains fields(f2, f3, f4, date2).
I want to search on the two cores with the date field. Is there an alias to 
query the two fields on distributed search.

For example when q=dateField:NOW perform search on date1 and date2. And I want 
to sort on the dateField which sort date1 and date2.

Regards,
Mahmoud 

Re: migrate(or copy) data from one core1(node2) to anothere core2(node1)

2015-12-02 Thread Erick Erickson
How are you splitting your core to shards? And why? You should only shard
when you cannot get reasonable performance on a single shard. To increast
the queries-per-second, simply add replicas.

And if at all possible, it would be much less error-prone to just re-index
your data into a collection that was created with shards.

really, this seems like an XY problem. You're asking about specific actions
without explaining what you want to accomplish at a high level.

So why are you going to SolrCloud in the first place? Do you want to go
from a single index to a sharded one? Why? What do you expect to gain?

Best,
Erick

On Wed, Dec 2, 2015 at 9:37 AM, Mugeesh Husain  wrote:
> Thanks Erick,
>
> I am making join operation for multiple core in solrcloud mode.
>
>>>After that's done, and assuming you want to add replicas in the SolrCloud
> version for HA/DR/Performance reasons, use the ADDREPLICA  Collections API
> command.
>
> If i split core into shard  then there is any way to use join in multiple
> shards.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/migrate-Copy-data-from-one-core1-server1-to-another-core2-server2-tp4243159p4243210.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes I am sure because i successeflly Post the same document(455 .doc .docx
and pdf in 18 second) with SimplePostTool
But now i want to commincate directly with my server solr using solarium in
my application cakephp ; I think only way to have the right encoding is in
header :
*$headers = array('Content-Type:multipart/form-data');*
* I guess it will *working if the time of indexing is not depassing 30
second from time out of wamp server.

2015-12-02 16:32 GMT+00:00 Gora Mohanty :

> On 2 December 2015 at 21:55, kostali hassan 
> wrote:
> > yes they are a Error in my solr logs:
> > SolrException URLDecoder: Invalid character encoding detected after
> > position 79 of query string / form data (while parsing as UTF-8)
> > <
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
> >
> > this is my post in stack overflow :
> >
> http://stackoverflow.com/questions/34017889/solrexception-urldecoder-invalid-character-encoding-detected-after-position-79
>
> Looks like an encoding error all right. Are you very sure that you can
> sucessfully POST the same document with SimplePostTool. If so, I would
> guess that you are not using Solarium correctly, i.e., the PDF file is
> getting POSTed such that Solr is getting the raw content rather than
> the extracted content.
>
> Regards,
> Gora
>


Re: Error on DIH log

2015-12-02 Thread Gora Mohanty
On 27 November 2015 at 11:12, Midas A  wrote:
> Error:
> org.apache.solr.common.SolrException: ERROR: [doc=83629504] Error adding
> field 'master_id'='java.math.BigInteger:0' msg=For input string:
> "java.math.BigInteger:0"

Sorry, was busy the last few days. On a closer look, it seems that
there is an issue with java.math.BigInteger fields not being
serialised properly, i.e., the input seems to be
"java.math.BigInteger:0" rather than 0.

Which version of Solr are you using? Might have to do with
https://issues.apache.org/jira/browse/SOLR-6165

Regards,
Gora


Problems integrating Uima with solr

2015-12-02 Thread vaibhavlella
I followed these steps but was getting warnings 

Step 1:Setting  tags in solrconfig.xml appropriately to point those
jar files
  

   

Step 2:modified  solrconfig.xml adding the following snippet and working API
keys:


  

  VALID_ALCHEMYAPI_KEY
  VALID_ALCHEMYAPI_KEY
  VALID_ALCHEMYAPI_KEY
  VALID_ALCHEMYAPI_KEY
  VALID_ALCHEMYAPI_KEY
  VALID_OPENCALAIS_KEY

/Users/vaibhavlella/solr/solr-5.3.1/tester/OverridingParamsExtServicesAE.xml

false


  false
  
body
  


  
org.apache.uima.alchemy.ts.concept.ConceptFS

  text
  concept

  
  
org.apache.uima.alchemy.ts.language.LanguageFS

  language
  language

  
  
org.apache.uima.SentenceAnnotation

  coveredText
  sentence

  

  



  
Step 3:modified schema.xml as follows:




  
   
   
   
   
   
   
   
Step 4: Created a new UpdateRequestHandler with the following NOTE:using
solr.UpdateRequestHandler instead of solr.XmlUpdateRequestHandler since
error was being generated.
  
  

  uima

  
Step 5:Tried to index books.json that is given in exampledocs with solr and
got the following warnings and no new fields were generated
$ bin/solr start -s tester/solr
$ bin/post -c newcore tester/books.json
The following was the response:
java -classpath /Users/vaibhavlella/solr/solr-5.3.1/dist/solr-core-5.3.1.jar
-Dauto=yes -Dc=newcore -Ddata=files org.apache.solr.util.SimplePostTool
tester/books.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/newcore/update...
Entering auto mode. File endings considered are
xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.json (application/json) to [base]
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for url:
http://localhost:8983/solr/newcore/update
SimplePostTool: WARNING: Response:
{"responseHeader":{"status":500,"QTime":6534},"error":{"msg":"processing
error null. id=978-0641723445, 
text=\"null...\"","trace":"org.apache.solr.common.SolrException: processing
error null. id=978-0641723445,  text=\"null...\"\n\tat
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:127)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:470)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:134)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:113)\n\tat
org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:76)\n\tat
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)\n\tat
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat

Failed to create collection in Solrcloud

2015-12-02 Thread Mugeesh Husain
Hi,

I am using 3 server ,solr1,solr2a sn solr3

I have setup 3 instance of zookeeper in server solr 2

when i try to create 1 shards and 2 replica, it work find.
while i am try to create core with 1 shards with 3 replication,using this
command bin/solr create -c abc  -n abcr -shards 1 -replicationFactor 3

I am getting below error,

 ERROR: Failed to create collection 'abc' due to:
org.apache.solr.client.solrj.SolrServerException:IOException occured when
talking to server at: http://xx.yyy.zz:8985/solr (server -solr 3)

solr 3:) i did start this server using this command bin/solr start -cloud -p
8985 -s "example/cloud/node1/solr" -z solr2:2181,solr2:2182,solr3:2183

what is the issue i unable to solved it.

Thanks
Mugeesh







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Failed-to-create-collection-in-Solrcloud-tp4243232.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Auto-Complete

2015-12-02 Thread Salman Ansari
Sounds good but I heard "/suggest" component is the recommended way of
doing auto-complete in the new versions of Solr. Something along the lines
of this article
https://cwiki.apache.org/confluence/display/solr/Suggester


  
mySuggester
FuzzyLookupFactory
DocumentDictionaryFactory
cat
price
string
false
  


Can someone confirm this?

Regards,
Salman


On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti 
wrote:

> Hi Salman,
> I agree with Alan.
> Just configure your schema with the proper analysers .
> For the field you want to use for suggestions you are likely to need simply
> this fieldType :
>
>  positionIncrementGap="100">
> 
> 
> 
>  maxGramSize="20"/>
> 
> 
> 
> 
> 
> 
>
> This is a very sample example, please adapt it to your use case.
>
> Cheers
>
> On 2 December 2015 at 09:41, Alan Woodward  wrote:
>
> > Hi Salman,
> >
> > It sounds as though you want to do a normal search against a special
> > 'suggest' field, that's been indexed with edge ngrams.
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
> >
> > > Hi,
> > >
> > > I am looking for auto-complete in Solr but on top of just auto
> complete I
> > > want as well to return the data completely (not just suggestions), so I
> > > want to get back the ids, and other fields in the whole document. I
> tried
> > > the following 2 approaches but each had issues
> > >
> > > 1) Used the /suggest component but that returns a very specific format
> > > which looks like I cannot customize. I want to return the whole
> document
> > > that has a matching field and not only the suggestion list. So for
> > example,
> > > if I write "hard" it returns the results in a specific format as
> follows
> > >
> > >   hard drive
> > > hard disk
> > >
> > > Is there a way to get back additional fields with suggestions?
> > >
> > > 2) Tried the normal /select component but that does not do
> auto-complete
> > on
> > > portion of the word. So, for example, if I write the query as "bara" it
> > > DOES NOT return "barack obama". Any suggestions how to solve this?
> > >
> > >
> > > Regards,
> > > Salman
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Different Similarities for the same field

2015-12-02 Thread Scott Stults
I haven't tried this before (overriding default similarity in a custom
SearchComponent), but it looks like it should be possible. In
QueryComponent.process() you can get a hold of the SolrIndexSearcher and
call setSimilarity(). It also looks like this is set only once by default
when the searcher is created, so you may need to set it back to the default
similarity when you're done.


k/r,
Scott

On Tue, Nov 24, 2015 at 10:25 AM, Markus, Sascha 
wrote:

> Hi,
> I implemented a Similarity which is based on the DefaultSimilarity changing
> the calculation for the idf.
> To work with this CustomSimilarity and the DefaultSimilarity from our
> application I have one field with the default and a copyfield with my
> similarity.
> Concerning the extra space needed for this field I wonder if there is a way
> to have my similarity or the default one on the SAME field. Because there
> are no differences for the index. E.g. by creating a SearchComponent to
> have something like solr/mySelect for queries with my similarity and the
> usual solr/select for the default similarity?
> How could I achive this, has anybody a hint?
>
> Cheers,
>  Sascha
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Grouping by simhash signature

2015-12-02 Thread Nickolay41189
I try to implement NearDup detection by  SimHash
   algorithm in Solr. 
Let's say:
1) each document has a field /simhash_signature/ that stores a sequence of
bits.
2) that in order to be considered NearDup, documents must have, at most, 2
bits that differ in /simhash_signature/


*My question:*
How can I get groups of nearDup by /simhash_signature/?

*Examples:*
  Input:
Doc A = 0001000
Doc B = 100
Doc C = 111
Doc D = 0101000
  Output:
A -> {B, D}
B -> {A}
C -> {}
D -> {A}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-by-simhash-signature-tp4243236.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it possible to sort on a BooleanField?

2015-12-02 Thread Clemens Wyss DEV
Looks like not. I get to see
'can not sort on a field which is neither indexed nor has doc values: '

- Clemens


Re: Failed to create collection in Solrcloud

2015-12-02 Thread Zheng Lin Edwin Yeo
Hi Mugesh,

Which version of Solr and ZooKeeper are you using?

i did start this server using this command bin/solr start -cloud -p
8985 -s "example/cloud/node1/solr" -z solr2:2181,solr2:2182,solr3:2183
> Shouldn't the command be  "*bin/solr start -cloud -p 8985 -s
"example/cloud/node1/solr" -z solr1:2181,solr2:2182,solr3:**2183*"

Regards,
Edwin



On 3 December 2015 at 03:47, Mugeesh Husain  wrote:

> Hi,
>
> I am using 3 server ,solr1,solr2a sn solr3
>
> I have setup 3 instance of zookeeper in server solr 2
>
> when i try to create 1 shards and 2 replica, it work find.
> while i am try to create core with 1 shards with 3 replication,using this
> command bin/solr create -c abc  -n abcr -shards 1 -replicationFactor 3
>
> I am getting below error,
>
>  ERROR: Failed to create collection 'abc' due to:
> org.apache.solr.client.solrj.SolrServerException:IOException occured when
> talking to server at: http://xx.yyy.zz:8985/solr (server -solr 3)
>
> solr 3:) i did start this server using this command bin/solr start -cloud
> -p
> 8985 -s "example/cloud/node1/solr" -z solr2:2181,solr2:2182,solr3:2183
>
> what is the issue i unable to solved it.
>
> Thanks
> Mugeesh
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Failed-to-create-collection-in-Solrcloud-tp4243232.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Auto-Complete

2015-12-02 Thread Andrea Gazzarini
Hi Salman,
few months ago I have been involved in a project similar to map.geoadmin.ch
and there, I had your same need (I also sent an email to this list).

>From my side I can furtherly confirm what Alan and Alessandro already
explained, I followed that approach.

IMHO, that is the "recommended way" if the component's features meet your
needs (i.e. do not reinvent the wheel) but it seems you're out of those
bounds.

Best,
Andrea
On 2 Dec 2015 21:51, "Salman Ansari"  wrote:

> Sounds good but I heard "/suggest" component is the recommended way of
> doing auto-complete in the new versions of Solr. Something along the lines
> of this article
> https://cwiki.apache.org/confluence/display/solr/Suggester
>
> 
>   
> mySuggester
> FuzzyLookupFactory
> DocumentDictionaryFactory
> cat
> price
> string
> false
>   
> 
>
> Can someone confirm this?
>
> Regards,
> Salman
>
>
> On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti <
> abenede...@apache.org>
> wrote:
>
> > Hi Salman,
> > I agree with Alan.
> > Just configure your schema with the proper analysers .
> > For the field you want to use for suggestions you are likely to need
> simply
> > this fieldType :
> >
> >  > positionIncrementGap="100">
> > 
> > 
> > 
> >  > maxGramSize="20"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > This is a very sample example, please adapt it to your use case.
> >
> > Cheers
> >
> > On 2 December 2015 at 09:41, Alan Woodward  wrote:
> >
> > > Hi Salman,
> > >
> > > It sounds as though you want to do a normal search against a special
> > > 'suggest' field, that's been indexed with edge ngrams.
> > >
> > > Alan Woodward
> > > www.flax.co.uk
> > >
> > >
> > > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
> > >
> > > > Hi,
> > > >
> > > > I am looking for auto-complete in Solr but on top of just auto
> > complete I
> > > > want as well to return the data completely (not just suggestions),
> so I
> > > > want to get back the ids, and other fields in the whole document. I
> > tried
> > > > the following 2 approaches but each had issues
> > > >
> > > > 1) Used the /suggest component but that returns a very specific
> format
> > > > which looks like I cannot customize. I want to return the whole
> > document
> > > > that has a matching field and not only the suggestion list. So for
> > > example,
> > > > if I write "hard" it returns the results in a specific format as
> > follows
> > > >
> > > >   hard drive
> > > > hard disk
> > > >
> > > > Is there a way to get back additional fields with suggestions?
> > > >
> > > > 2) Tried the normal /select component but that does not do
> > auto-complete
> > > on
> > > > portion of the word. So, for example, if I write the query as "bara"
> it
> > > > DOES NOT return "barack obama". Any suggestions how to solve this?
> > > >
> > > >
> > > > Regards,
> > > > Salman
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>


Re: Is it possible to sort on a BooleanField?

2015-12-02 Thread Muhammad Zahid Iqbal
Please share your schema.

On Thu, Dec 3, 2015 at 11:28 AM, Clemens Wyss DEV 
wrote:

> Looks like not. I get to see
> 'can not sort on a field which is neither indexed nor has doc values:
> '
>
> - Clemens
>


Re: Protect against duplicates with the Migrate statement

2015-12-02 Thread philippa griggs
I used two fields to set up the signature, the unique Id and a time stamp field.

As its in test, I set it up- cleared all the data out in both collecionsand 
reloaded it. I could see the signature which was created. I then migrated into 
cold collection which already had documents in with the same unique id and 
signature.
I ended up with duplicates in the cold collection.

Thanks for your help,

Philippa


From: Zheng Lin Edwin Yeo 
Sent: 03 December 2015 02:30:31
To: solr-user@lucene.apache.org
Subject: Re: Protect against duplicates with the Migrate statement

Hi Philippa,

Which field did you use to set it as SignatureField in your ColdDocuments
when you implement the de-duplication?

Regards,
Edwin


On 2 December 2015 at 18:59, philippa griggs 
wrote:

> Hello,
>
>
> I'm using Solr 5.2.1 and Zookeeper 3.4.6.
>
>
> I'm implementing two collections - HotDocuments and ColdDocuments . New
> documents will only be written to HotDocuments and every night I will
> migrate a chunk of documents into ColdDocuments.
>
>
> In the test environment, I have the Collection API migrate statement
> working fine. I know this won't handle duplicates ending up in the
> ColdDocuments collection and I don't expect to have duplicate documents but
> I would like to protect against it- just in case.
>
>
> We have a unique key and I've tried to implement de-duplication (
> https://cwiki.apache.org/confluence/display/solr/De-Duplication) but I
> still end up with duplicates in the ColdDocuments collection.
>
>
>
> Does anyone have any suggestions on how I can protect against duplicates
> with the migrate statement?  Any ideas would be greatly appreciated.
>
>
> Many thanks
>
> Philippa
>


AW: Is it possible to sort on a BooleanField?

2015-12-02 Thread Clemens Wyss DEV
...

...

...

Guess then I must set indexed="true" ;) Is it true the BooleanField may not 
have docValues?

-Ursprüngliche Nachricht-
Von: Muhammad Zahid Iqbal [mailto:zahid.iq...@northbaysolutions.net] 
Gesendet: Donnerstag, 3. Dezember 2015 08:01
An: solr-user
Betreff: Re: Is it possible to sort on a BooleanField?

Please share your schema.

On Thu, Dec 3, 2015 at 11:28 AM, Clemens Wyss DEV 
wrote:

> Looks like not. I get to see
> 'can not sort on a field which is neither indexed nor has doc values:
> '
>
> - Clemens
>


Solr Auto-Complete

2015-12-02 Thread Salman Ansari
Hi,

I am looking for auto-complete in Solr but on top of just auto complete I
want as well to return the data completely (not just suggestions), so I
want to get back the ids, and other fields in the whole document. I tried
the following 2 approaches but each had issues

1) Used the /suggest component but that returns a very specific format
which looks like I cannot customize. I want to return the whole document
that has a matching field and not only the suggestion list. So for example,
if I write "hard" it returns the results in a specific format as follows

  hard drive
hard disk

 Is there a way to get back additional fields with suggestions?

2) Tried the normal /select component but that does not do auto-complete on
portion of the word. So, for example, if I write the query as "bara" it
DOES NOT return "barack obama". Any suggestions how to solve this?


Regards,
Salman


indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
HOW I can indexing from solarium rich data(msword and pdf files) from a
dirctory who contient many files, MY config is

$config = array(
 "endpoint" => array("localhost" => array("host"=>"127.0.0.1",
 "port"=>"8983", "path"=>"/solr", "core"=>"demo",)
) );

I try this code:

$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$update = $client->createUpdate();

$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);
$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();
$query->setDocument($doc);

$result = $client->extract($query);
}

When i execute it i get this ERROR:

org.apache.solr.common.SolrException: URLDecoder: Invalid character
encoding detected after position 79 of query string / form data (while
parsing as UTF-8)


Re: indexing rich data from directory using solarium

2015-12-02 Thread Gora Mohanty
On 2 December 2015 at 16:32, kostali hassan  wrote:
[...]
>
> When i execute it i get this ERROR:
>
> org.apache.solr.common.SolrException: URLDecoder: Invalid character
> encoding detected after position 79 of query string / form data (while
> parsing as UTF-8)

Solr expects UTF-8 data. Your documents are probably in some different
encoding. You will need to figure out what the encoding is, and how to
convert it to UTF-8.

Regards,
Gora


Re: Solr Auto-Complete

2015-12-02 Thread Alan Woodward
Hi Salman,

It sounds as though you want to do a normal search against a special 'suggest' 
field, that's been indexed with edge ngrams.

Alan Woodward
www.flax.co.uk


On 2 Dec 2015, at 09:31, Salman Ansari wrote:

> Hi,
> 
> I am looking for auto-complete in Solr but on top of just auto complete I
> want as well to return the data completely (not just suggestions), so I
> want to get back the ids, and other fields in the whole document. I tried
> the following 2 approaches but each had issues
> 
> 1) Used the /suggest component but that returns a very specific format
> which looks like I cannot customize. I want to return the whole document
> that has a matching field and not only the suggestion list. So for example,
> if I write "hard" it returns the results in a specific format as follows
> 
>   hard drive
> hard disk
> 
> Is there a way to get back additional fields with suggestions?
> 
> 2) Tried the normal /select component but that does not do auto-complete on
> portion of the word. So, for example, if I write the query as "bara" it
> DOES NOT return "barack obama". Any suggestions how to solve this?
> 
> 
> Regards,
> Salman



Re: Solr Auto-Complete

2015-12-02 Thread Alessandro Benedetti
Hi Salman,
I agree with Alan.
Just configure your schema with the proper analysers .
For the field you want to use for suggestions you are likely to need simply
this fieldType :













This is a very sample example, please adapt it to your use case.

Cheers

On 2 December 2015 at 09:41, Alan Woodward  wrote:

> Hi Salman,
>
> It sounds as though you want to do a normal search against a special
> 'suggest' field, that's been indexed with edge ngrams.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 2 Dec 2015, at 09:31, Salman Ansari wrote:
>
> > Hi,
> >
> > I am looking for auto-complete in Solr but on top of just auto complete I
> > want as well to return the data completely (not just suggestions), so I
> > want to get back the ids, and other fields in the whole document. I tried
> > the following 2 approaches but each had issues
> >
> > 1) Used the /suggest component but that returns a very specific format
> > which looks like I cannot customize. I want to return the whole document
> > that has a matching field and not only the suggestion list. So for
> example,
> > if I write "hard" it returns the results in a specific format as follows
> >
> >   hard drive
> > hard disk
> >
> > Is there a way to get back additional fields with suggestions?
> >
> > 2) Tried the normal /select component but that does not do auto-complete
> on
> > portion of the word. So, for example, if I write the query as "bara" it
> > DOES NOT return "barack obama". Any suggestions how to solve this?
> >
> >
> > Regards,
> > Salman
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Protect against duplicates with the Migrate statement

2015-12-02 Thread philippa griggs
Hello,


I'm using Solr 5.2.1 and Zookeeper 3.4.6.


I'm implementing two collections - HotDocuments and ColdDocuments . New 
documents will only be written to HotDocuments and every night I will migrate a 
chunk of documents into ColdDocuments.


In the test environment, I have the Collection API migrate statement working 
fine. I know this won't handle duplicates ending up in the ColdDocuments 
collection and I don't expect to have duplicate documents but I would like to 
protect against it- just in case.


We have a unique key and I've tried to implement de-duplication 
(https://cwiki.apache.org/confluence/display/solr/De-Duplication) but I still 
end up with duplicates in the ColdDocuments collection.



Does anyone have any suggestions on how I can protect against duplicates with 
the migrate statement?  Any ideas would be greatly appreciated.


Many thanks

Philippa


UpdateLogs in HDFS

2015-12-02 Thread Alan Woodward
Hi all,

As a step in SOLR-8282, I'm trying to get all access to the data directory done 
by Solr to be mediated through the DirectoryFactory implementation.  Part of 
this is the creation of the UpdateLog, and I'm a bit confused by some of the 
logic in there currently.

The UpdateLog is created by the UpdateHandler, which has some logic in there to 
determine whether or not to use a standard log or an HDFSUpdateLog.  In 
particular, around line 117, we check to see if the update log directory begins 
with "hdfs:/", and if it does we then do a further check to see if the 
directory factory is an HDFSDirectoryFactory or not.

This seems to imply that Solr currently supports storing the update log in HDFS 
even if the actual indexes are on a normal file system.  Which seems odd, at 
the very least.  All our docs say to use HDFSDirectoryFactory if you want to 
store anything in HDFS, and there's nothing anywhere about storing the update 
logs separately from the indexes.  Is this a relic of past behaviour, or is it 
something that a) should be preserved by the refactoring I'm doing, and b) 
documented and tested?

Alan Woodward
www.flax.co.uk




Re: Create Collection Admin Request - unable to specify collection configName

2015-12-02 Thread Kelly, Frank
Thank you everyone - this was EXACTLY my problem.
I was using a chroot for startup but not on the upload of configurations.
Now everything works as expected.

Thanks everyone!

-Frank



On 12/2/15, 12:10 AM, "Upayavira"  wrote:

>Adding /solr to the zk string 'namespaces' the data within a sor
>directory inside zookeeper, which is a useful feature. It allows you to
>share zk between multiple applications. However, you must use the same
>at startup and with zkcli. So either remove the /solr or add it to the
>zkcli lines also.
>
>Upayavirq



Re: indexing rich data from directory using solarium

2015-12-02 Thread kostali hassan
yes its logic Thank you , but i want understand why the same data is
indexing fine in shell using windows SimplePostTool :
>
> D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes
> -Dc=solr_docs_core -Ddata=files -Drecursive=yes
> org.apache.solr.util.SimplePostTool D:\Lucene\document ;



2015-12-02 11:09 GMT+00:00 Gora Mohanty :

> On 2 December 2015 at 16:32, kostali hassan 
> wrote:
> [...]
> >
> > When i execute it i get this ERROR:
> >
> > org.apache.solr.common.SolrException: URLDecoder: Invalid character
> > encoding detected after position 79 of query string / form data (while
> > parsing as UTF-8)
>
> Solr expects UTF-8 data. Your documents are probably in some different
> encoding. You will need to figure out what the encoding is, and how to
> convert it to UTF-8.
>
> Regards,
> Gora
>


Re: Protect against duplicates with the Migrate statement

2015-12-02 Thread Zheng Lin Edwin Yeo
Hi Philippa,

Which field did you use to set it as SignatureField in your ColdDocuments
when you implement the de-duplication?

Regards,
Edwin


On 2 December 2015 at 18:59, philippa griggs 
wrote:

> Hello,
>
>
> I'm using Solr 5.2.1 and Zookeeper 3.4.6.
>
>
> I'm implementing two collections - HotDocuments and ColdDocuments . New
> documents will only be written to HotDocuments and every night I will
> migrate a chunk of documents into ColdDocuments.
>
>
> In the test environment, I have the Collection API migrate statement
> working fine. I know this won't handle duplicates ending up in the
> ColdDocuments collection and I don't expect to have duplicate documents but
> I would like to protect against it- just in case.
>
>
> We have a unique key and I've tried to implement de-duplication (
> https://cwiki.apache.org/confluence/display/solr/De-Duplication) but I
> still end up with duplicates in the ColdDocuments collection.
>
>
>
> Does anyone have any suggestions on how I can protect against duplicates
> with the migrate statement?  Any ideas would be greatly appreciated.
>
>
> Many thanks
>
> Philippa
>