[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-30 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307137#comment-16307137
 ] 

Christopher Tubbs commented on ACCUMULO-4751:
-

Just a quick note in case the git commit history is confusing: 

* this patch was applied to the 1.8 branch as commit 
0a7c465d9eace6b8c9dccdfd37a92a0e1bcfbd52
* a nearly identical commit was applied to the master branch as 
3161b98f05615844caf6980c8b5922375e92bd32 (with slight differences due to 
unrelated changes with table IDs being strongly typed in that branch)
* branch 1.8 was merged into the master branch as 
984dbf54b23634a4a461a54de07ca016f9430af0 to ensure that 1.8 could be cleanly 
merged into master in the future (this merge commit did not change any code in 
the master branch, but it did ensure that 
0a7c465d9eace6b8c9dccdfd37a92a0e1bcfbd52 is included in the master branch's 
history)


> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>  Labels: pull-request-available
> Fix For: 1.8.2, 2.0.0
>
> Attachments: repl_logs.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-06 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281030#comment-16281030
 ] 

Adam J Shook commented on ACCUMULO-4751:


[~elserj] Not too certain if we even need that block of code to update metadata 
for unused WALs in {{DataFileManager}}?  The solutions I see now would be to 

1. Use {{StatusUtil.openWithUnknownLength(System.currentTimeMillis())}} here to 
add a {{createdTime}} to the metadata
2. Add the handling logic in the {{StatusMaker}} to set the {{createdTime}} to 
the WAL to the HDFS timestamp (#2 above) 
3. Delete this block entirely (it might be needed at other times?  I don't know 
the pipeline at this level to know if we need to flag unused WALs)

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
> Attachments: repl_logs.txt
>
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-06 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280957#comment-16280957
 ] 

Adam J Shook commented on ACCUMULO-4751:


I can confirm that this is the case.  Looking at the timestamps from the 
TServer and the Master, you can see the Master remove the replication entries 
after it has been complete, but then [1] gets called via 
{{Tablet#minorCompact}} minutes later and inserts an entry into the metadata 
table without a {{createdTime}}, which is the latest entry in the table.

[1] 
https://github.com/apache/accumulo/blob/rel/1.8.1/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L422

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
> Attachments: repl_logs.txt
>
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-06 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280853#comment-16280853
 ] 

Adam J Shook commented on ACCUMULO-4751:


I have attached some logs tracking a particular WAL file.  You can see that it 
has a {{createdTime}} but at some point a deleting entry must be written (note 
the timestamp change but the {{createdTime}} is gone) and then other entries 
added.

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
> Attachments: repl_logs.txt
>
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277195#comment-16277195
 ] 

Josh Elser commented on ACCUMULO-4751:
--

bq. My initial finding is that some entries coming out of the StatusUtil don't 
have a createdTime. Whereas something like [1] writes a metadata entry to a WAL 
and the replication status is updated with a createdTime. I am guessing that, 
in this case, there is no ingest for a particular table during the lifetime of 
the WAL, so the work to replicate the WAL to a particular table (via the key 
extent) is never created. However, something else is writing some information 
that doesn't have a createdTime.

Yeah, I do recall this distinction. I remember having a separation between when 
a WAL is "created" in memory (more like the TServer decides it's going to use 
that file), but there aren't any records in that file yet. I could also see 
replication missing some part of the lifecycle of the WAL itself as this is 
pretty obtuse (not explicitly documented). LMK if you need to knock heads 
together.

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-04 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277154#comment-16277154
 ] 

Adam J Shook commented on ACCUMULO-4751:


My initial finding is that some entries coming out of the {{StatusUtil}} don't 
have a {{createdTime}}.  Whereas something like [1] writes a metadata entry to 
a WAL and the replication status is updated with a {{createdTime}}.  I am 
guessing that, in this case, there is no ingest for a particular table during 
the lifetime of the WAL, so the work to replicate the WAL to a particular table 
(via the key extent) is never created.  However, something else is writing some 
information that doesn't have a {{createdTime}}.

I'll do some more digging to see if this can be nailed down.

[1] 
https://github.com/apache/accumulo/blob/rel/1.8.1/server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java#L390

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277096#comment-16277096
 ] 

Josh Elser commented on ACCUMULO-4751:
--

I would say #2 at first glance, but I am more worried about how we missed the 
createdTime situation.

Ideally, even in the face of TServer crashes, the TServer would set the correct 
"metadata" on each Status record. Do you have any hunch as to how this record 
exists without the createdTime attribute set?

It would be nice to confirm that we don't have some other kind of bug lingering 
in which we're just not writing the record correctly. I wouldn't be surprised 
if we would actually need some kind of solution (like #2) to guard against some 
kind of unlikely situation (e.g. tserver failure) in addition another bug. In 
other words, the catch-all to prevent the system from "wedging" on these WALs 
would be appreciated.

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

2017-12-04 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277044#comment-16277044
 ] 

Adam J Shook commented on ACCUMULO-4751:


[~elserj] when you have the time, could you let me know your thoughts on 1 and 
2 above?  Happy to contribute a fix.

> Some WALs don't replicate due to lacking a createdTime entry
> 
>
> Key: ACCUMULO-4751
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.3, 1.8.1
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)