Also, found this error at the end of the bin/nutch inject job:
2013-11-15 17:09:28,630 DEBUG
org.apache.hadoop.mapred.TaskLogsTruncater: Cannot open
/data/search/hadoop/hadoop-1.2.1/libexec/../logs/userlogs/job_201311151641_0008/attempt_201311151641_0008_m_000000_0/profile.out
for reading. Continuing with other log files
java.io.FileNotFoundException:
/data/search/hadoop/hadoop-1.2.1/libexec/../logs/userlogs/job_201311151641_0008/attempt_201311151641_0008_m_000000_0/profile.out
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:188)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-11-15 17:09:28,631 DEBUG
org.apache.hadoop.mapred.TaskLogsTruncater: Cannot open
/data/search/hadoop/hadoop-1.2.1/libexec/../logs/userlogs/job_201311151641_0008/attempt_201311151641_0008_m_000000_0/debugout
for reading. Continuing with other log files
java.io.FileNotFoundException:
/data/search/hadoop/hadoop-1.2.1/libexec/../logs/userlogs/job_201311151641_0008/attempt_201311151641_0008_m_000000_0/debugout
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:188)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
On Fri, Nov 15, 2013 at 12:16 PM, Jon Uhal <[email protected]> wrote:
> I was wrong. So changing the gora.datastore.accumulo.user property caused
> the inject to finish and on the command line, it looked like it was
> successful:
>
> 13/11/15 17:09:30 INFO crawl.InjectorJob: InjectorJob: total number of
> urls rejected by filters: 0
> 13/11/15 17:09:30 INFO crawl.InjectorJob: InjectorJob: total number of
> urls injected after normalization and filtering: 35
>
> But when I tried to run the bin/nutch generate command, it gives me the
> error:
>
> 13/11/15 17:12:36 ERROR store.AccumuloStore:
> org.apache.accumulo.core.client.AccumuloSecurityException: Error
> BAD_CREDENTIALS - Username or Password is Invalid
>
> Still trying to figure this out...
>
>
>
> On Fri, Nov 15, 2013 at 11:52 AM, Jon Uhal <[email protected]> wrote:
>
>> So I think I figured this out. I believe this had to do with the Nutch
>> conf/gora.properties settings for Accumulo. The default user was set to:
>>
>> gora.datastore.accumulo.user=root
>>
>> and after trying to clean up ZooKeeper, I was running into issues trying
>> to remove /accumulo from ZooKeeper. It looked like a permissions issue and
>> I ran across this:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/accumulo-user/201309.mbox/%3CCAGUtCHqY9eKM-modotn8YRmGR6Aus=oQkT9ys-=+v7-=oof...@mail.gmail.com%3E
>>
>> I didn't realize there might be an accumulo user that was accessing
>> ZooKeeper. I updated Nutch's gora.properties file to have:
>>
>> gora.datastore.accumulo.user=accumulo
>>
>> and things look like they are working.
>>
>> I'm not sure if this is the only change that caused things to start
>> working, but it looks like things are getting injected successfully.
>>
>>
>> On Thu, Nov 14, 2013 at 4:33 PM, Lewis John Mcgibbney <
>> [email protected]> wrote:
>>
>>> Hi Jon,
>>>
>>> Glad to hear that your making some more progress!
>>>
>>> On Thu, Nov 14, 2013 at 8:45 PM, <[email protected]>
>>> wrote:
>>>
>>> >
>>> > So I think it has to do with Accumulo somehow. I reverted the
>>> > conf/gora.properties setting for mock from false to:
>>> >
>>> > gora.datastore.accumulo.mock=true
>>> >
>>> > and re-building and re-running the runtime deploy job completed
>>> > successfully. Trying to see if I can track down the issue.
>>> >
>>> >
>>> >
>>> I am not sure about this approach. Have you tried editing the
>>> gora.datastore.accumulo.zookeepers=localhost property to the IP for the
>>> Zookeeper(s) server? I am not certain that simulating a mock datastore is
>>> the way to go here.
>>> AccumuloStore contains the following code
>>>
>>> try {
>>> if (mock == null || !mock.equals("true")) {
>>> String instance = DataStoreFactory.findProperty(properties,
>>> this, INSTANCE_NAME_PROPERTY, null);
>>> String zookeepers =
>>> DataStoreFactory.findProperty(properties, this,
>>> ZOOKEEPERS_NAME_PROPERTY, null);
>>> conn = new ZooKeeperInstance(instance,
>>> zookeepers).getConnector(user, password);
>>> authInfo = new AuthInfo(user,
>>> ByteBuffer.wrap(password.getBytes()),
>>> conn.getInstance().getInstanceID());
>>> } else {
>>> conn = new MockInstance().getConnector(user, password);
>>> }
>>>
>>> This to me indicates that if you want to create the persistent data store
>>> then you would edit the mock property to boolean false which will take
>>> you
>>> in to the if block. Then you are just searching for configuration
>>> properties for the Accumulo server instance, zookeeper server instance
>>> and
>>> usename and password from gora.proerties
>>> hth, please let us know how you get on... and also how the AccumuloStore
>>> is
>>> working. AFAIK it is one of the lesser used data stores so we are always
>>> keen to hear of user experiences, etc.
>>> Thanks
>>> Lewis
>>>
>>
>>
>>
>> --
>> Jon Uhal
>>
>
>
>
> --
> Jon Uhal
>
--
Jon Uhal