Hello, Please send resumes to [email protected] (or) Reach me at 408-604-9252 ASAP
*Job Description:* * * *Title: Data Analyst—Sr * *Location: Reston, VA* *Pay rate: open* *Duration: 3 months* * * *Job Description:* Responsible for data analysis, validation, cleansing, collection and reporting. Extract and analyze data from various sources, including databases, manual files, and external websites. Respond to data inquiries from various groups within an organization. Create and publish regularly scheduled and/or ad hoc reports as needed. Document reporting requirements and process and validate data components as required. Experience with relational databases and knowledge of query tools and/or statistical software required.�Strong analytical and organizational skills required. Must possess expert level knowledge of MS Excel. 6+ years of prior experience as a Data Analyst is required. Skills More than 7 years of deep involvement with SQL and data warehouses. 2+ years of experience with HIVE. Working familiarity with Hadoop clusters. Goal The data warehouse is the most comprehensive set of domain-oriented transactions covering most of Verisign’s operated TLDs. It consolidates data from both Core as well as the Namestore system to provide a rich set of data over all Verisign operated TLDs (dotNAME will be added in 2013Q3). The data within the warehouse is currently utilized by a multitude of external facing services such as IPS, Data Analyzer, WhoWas and internal facing services and efforts such as Strategy and domain renewal forecasting. The goal of this effort is to provide easy access to current and relevant data fields currently stored within the Data Warehouse, in the Compute Cluster environment. This effort will help existing products utilize the additional data to extend or enhance their products and services, and eliminate the need for additional one-off data transfer scripts. Work We have determined that we will load into the Compute Cluster environment a one time dump of a big table (~1B rows) of 30+ fields that represent all the domain-related transactions that ever occurred; on a daily basis a much smaller table (100k-200k rows) of the day’s transactions will be ingested from the Data Warehouse to the Compute Cluster. The work will cover the following 3 areas: 1. Propose and implement in coordination with the Data Warehouse team the method for generating the one time dump and the daily increments and ingesting them into the Compute Cluster. Propose how the data should be stored in the Compute Cluster (e.g., use partitions, append updates into a large file, etc.) 2. Implement HIVE queries for the following user queries (will execute in the Compute Cluster): • What domains aren’t currently but have been registered in the past? • Which domains were registered between time X & Y? And passed the AGP? • Which domains expired between time X & Y? • Which domains were active in a particular TLD at time X? • What set of domains were previously registered? • For a given domain, when was it last registered? • How many different registrars has a given domain been registered by? Which ones? • What are the temporal registration patterns of these (a list) domains? • How many times has this domain renewed? Transferred? Deleted? • How many total days has this domain been active? • What were the domains in the zone for a specific date? Perhaps pre-calculate those for every date Additional user queries might be provided for implementation as HIVE queries. If possible, care should be taken to only use UDF’s that are also available in IMPALA, as IMPALA might be used for faster query processing of the Data Warehouse data in the Compute Cluster. One known complication is the handling of time as a parameter for those queries in HIVE, given the constraints of HIVE with handling time. One option is to convert time in the Data Warehouse data into UNIX_EPOCH time, so that the HIVE queries can convert user-inputted dates into UNIX_EPOCH time for execution against the source data. Alternative recommendations will be appreciated. 3. Propose and implement the optimal ways for users to interact with the Data Warehouse data in the Compute Cluster, e.g., - pre-compute data sets that will be heavily reused - create HIVE views for queries in (2) that users can re-use with specific parameters - use additional partitions for the source or pre-computed data *----* *Thank you,* *Yuvi Sr. Associate Talent Acquisition Bartronics America* *Ph: 408-604-9252* *Email: [email protected] Yahoo IM: yuvi.recruiter Gtalk: yuvi1.recruiter* -- You received this message because you are subscribed to the Google Groups "US_IT.Groups" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/us_itgroups. For more options, visit https://groups.google.com/groups/opt_out.
